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NUCLEIC ACID SEQUENCE ENCODING 
OVARIAN ANTIGEN. CA125, AND USES THEREOF 

This application ' claims benefit of U.S. Patent Application 
No. 60/290,480, Filed on 11 May 2001, the content of which 
is incorporated here into this application. 

The invention disclosed herein was made with government 
support under NIH Grants No. CA52477 and CA08748, from the 
United States Department of Health and Human Services. 
Accordingly, the U.S. Government has certain rights in this 
invention. 

Throughout this application, various references are referred 
to. Disclosures of these publications in their entireties 
are hereby incorporated by reference into this application 
to more fully describe the state of the art to which this 
invention pertains. 

BACKGROUND OP THE INVENTION 

CA125 antigen is a serum marker that is used routinely in 
gynecologic practice to monitor patients with ovarian 
cancer. It is a mullerian duct differentiation antigen that 
is overexpressed in epithelial ovarian cancer cells and 
secreted into the blood, although its expression is not 
entirely confined to ovarian cancer. CA125 was first 
identified by Bast and Knapp (1) in 1981 by a monoclonal 
antibody (OC125) that had been developed from mice immunized 
with an ovarian cancer cell line. These investigators 
subsequently developed a radio- immunoassay for the antigen 
and showed that serum CA125 levels are elevated in about 80% 
of patients with epithelial ovarian cancer (EOC) 1 but in less 
than 1% of healthy women (2) . Numerous studies since that 
time have confirmed the usefulness of CA125 levels in 
monitoring the progress of patients with EOC (3-6) . Most 
reports indicate that a rise in CA125 levels precedes 
clinical detection by about 3 months. During chemotherapy, 
changes in serum CA125 levels correlate with the course of 
the disease. CA125 is being used in the inventors' Medical 
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Center, and elsewhere, as a surrogate marker for clinical 
response in phase II trials of new drugs. On the other 
hand, CA125 is not useful in the initial diagnosis of EOC 
because of its elevation in a number of benign conditions 
5 (3, 7). Despite this limitation, CA125 is considered to be 
one of the best available cancer serum markers, however more 
information on its molecular nature is needed to fully 
explore its potential . 

10 Although CA125 antigen was first detected over 20 years ago, 
very little is known about its biochemistry and genetics. 
Most biochemical studies have concluded that CA125 is a high 
molecular weight glycoprotein, although estimates of its 
size range from 200 to 2000 kDa with smaller "subunifs" 
15 being described by some investigators (8-13). Most studies 
have shown that CA125 is a mucin- type molecule, but others 
have claimed that it is a typical glycoprotein with 
asparagine-linked sugar chains (14) . Another study claimed 
that CA125 is a glycosyl-phosphoinositol- linked glycoprotein 
20 (11). Thus, no consensus emerged from these studies 
concerning the biochemical nature of this antigen: 
Recently, however, our studies have strongly indicated that 
CA125 is a typical mucin molecule with a high carbohydrate 
content and a preponderance of serine and threonine -linked 
25 (0-linked) glycan chains (15, 16). Possibly because of the 
mucinous nature of CA125 its peptide moiety has been very 
difficult to clone. The only published study on this topic 
(17) described the isolation of a novel cDNA, later termed 
NBR-1 (18), but this species does not seem to have any of 
30 the biochemical characteristics expected for CA125 and may, 
in' fact, be a transcription factor. Using a rabbit 
antiserum to purified CA125 we have now cloned, by 
expression cloning, a long partial cDNA sequence 
corresponding to a new mucin species (designated 
35 CA125/MUC16) that is a strong candidate for being the 
peptide core of the CA125 antigen. 
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SUMMARY OF THE INVENTION 

The invention disclosed herein provides an isolated nucleic 
acid molecule comprising sequences encoding the CA125 
5 protein or a portion thereof. This invention also provides 
the gene encoding the CA125 protein. 

In addition, this invention provides a vaccine for cancer 
which expresses CA125 protein comprising an appropriate 
amount of the isolated nucleic acid molecules which, when 
expressed, are capable of producing a product which induces 
an immune response to CA125 protein. This invention also 
provides a vaccine for cancer which expresses CA125 protein 
comprising an appropriate amount of a substance which 
induces an immune response to CA125 protein. This invention 
also provides a method for the diagnosis of a cancer which 
expresses CA125 by detecting CA12 5 -expressing cells in the 
blood or other fluids of patients based on the nucleic acid 
sequence which encodes CA125. Furthermore, this invention 
provides, a method for monitoring the therapy of a cancer 
which expresses CA125 by measuring the expression of CA125- 
expressing cells in the blood or other fluids of patients 
based on the nucleic acid sequence which encodes CA125, a 
decrease of either the number of CA125 -expressing cells' or 
level of protein expression in the cell, indicating the 
success of the therapy. 

In addition, this invention provides ' a method of producing 
CA125 protein comprising steps of: a) constructing a vector 

30 adapted for expression in a cell which comprises the 
regulatory elements necessary for expression of nucleic acid 
in the cell operatively linked to the nucleic acid encoding 
the CA125 protein so as to permit expression thereof; b) 
placing the cells of step (a) under conditions allowing the 

35 expression of the CA125 protein; and c) recovering the CA125 
protein so expressed. 
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Finally, this invention provides a nonhuman 
wherein the expression of CA125 is inhibited. 
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DETAILED DESCRIPTION OF THE FIGURES 
First Series Of Experiments 

Fig. 1. SDS-PAGE analysis of purified CA125 sample. The 
5 gel (3% stacking gel and 5% separating gel) was run under 
reducing conditions and stained with silver reagent. The 
arrowhead indicates the interface between the stacking and 
separating gels. The migration positions of molecular 
weight markers (in kDa) are shown on the right hand side. 
10 The bracket indicates the region of the gel used to immunize 
a rabbit to produce the polyclonal anti-CA125 serum. 

Pig. 2. Nucleotide sequence at 3'. end of the B4 clone of 
CA125/MUC16. The nucleotide and amino acid sequence for B4 
15 (CA125/MUC16) have been deposited in the GenBank™ under 
accession number AF361486. Abbreviations: EOC: epithelial 
ovarian cancer; mAb: monoclonal antibody;. TR: tandem repeat; 
PBS: phosphate buffered saline. * indicates a stop codon. A 
polyadenylation signal sequence is underlined. 

20 Fig. 3. Deduced amino acid sequence of CA125/MUC16 (B4) 
organized to indicate the regions of homology in the tandem 
repeats. Clustered serine and threonine residues are 
highlighted in white/shade and conserved cysteine residues 

25 in bold/shade. Potential N-linked glycosylation sites (Asn) 
are indicated . in bold type. The possible transmembrane 
region is underlined and the consensus tyrosine 
phosphorylation motif is indicated in regular/shade, 
indicates residues that are perfectly conserved, except in 

30 the last repeat sequence. - indicates gaps introduced to 
preserve the best homology in the repeats. 

Fig. 4. Northern blot analysis of expression of 
CA125/MUC16 in cancer cell lines. The blot was probed with 
35 a biotin- labeled probe (B53) from the tandem repeat region. 
Is SW626 (ovarian cancer); 2: 2774 (ovarian cancer); 3 
SK-OV-3 (ovarian cancer); 4: SK-OV-8 (ovarian cancer); 5 
OVCAR-3 (ovarian cancer); 6: COL0316 (ovarian cancer); 7 
MCF-7 (breast cancer); 8: IMR-3 (neuroblastoma); 9: MKN45 
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(gastric cancer); 10: MCA (sarcoma). Indicated on the top 
of the figure (+ or -) is the expression of CA125 in the 
cell line as determined by reactivity with anti-CA125 
antibodies. The end-point titers for these cell lines with 
5 mAb OC125 were 1- <1:500; 2- <1:500; 3- <1:500; 4- 1 , 
128,000; 5- >1 : 256,000; 6- 1:4000; 7- <1:500; 8- <1:500; 
9- <1:500; 10- <1:500. Screening with mAb VK-8 gave similar 
results. The result of probing the blot with a p-actin 
probe is shown in the lower half of the figure. Size 
standards are indicated on the left side of the gel. 

Fig. 5. Deduced amino acid sequence of B4 polynucleotide 
(CA125) . 

15 Fig. 6. Nucleotide sequence of B4 polynucleotide (CA125) . 

Fig. 7. Nucleotide sequence of B30 polynucleotide coding 
for a different portion of the CA125 gene. 

20 Fig. 8. Deduced amino acid sequence of B30 polynucleotide 
corresponding to a different portion of the CA125 gene. 

Fig. 9. Expression analysis of CA125 nucleotide clone. 
This figure is the result of an expression experiment that 
confirms that the sequence actually codes for CA125, as 
recognized by standard antibodies. 

Second Series Of Experiments 

Fig. 10. Schematic showing the protein and nucleotide sequence, 
of the 3' end of clone B30. Also shown is the region identical ' 
to the 5' region of clone B4. The end of repeat H and the non- 
translated region are shown in detail. The stop codon in the 
nucleotide sequence is indicated in bold type. Note that 
repeats A-H correspond to repeats 7-14 in Fig. n. 

Fig. 11. Nucleotide sequence of MUC16B. 
Fig. 12. Amino acid sequence of MUC16B. 

Fig. 13. Schematic showing relationship of NCBI gene 
sequence NT 025133.6 to clone B30 and various expressed 
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sequence tags and the use of this information in determining 
the sequence of MUC16B. Bxons are shown as filled boxes and 
the orientation of the reading frames (+ or -) are indicated 
for each exon. 

5 
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DETAILED DESCRIPTION OF THE INVENTION 

The invention disclosed herein provides an isolated nucleic 
acid molecule comprising sequences encoding the CA125 
5 protein or a portion thereof. This invention also provides 
the gene encoding the CA125 protein. This invention further 
comprises the 5' untranslated sequence of the CA125 gene. In 
addition, this invention comprises the 3' untranslated 
sequence of the CA125 gene. 

10 

In addition, this invention provides the above isolated 
nucleic acid molecule comprising sequence set forth in 
Figure 6, or a portion thereof, and the corresponding CA125 
protein comprising sequence set forth in Figure 5, or a 
15 portion thereof. Furthermore, this invention provides the 
above isolated nucleic acid molecule comprising sequence set 
forth in Figure 7, or a portion thereof, and the 
corresponding CA125 protein sequence set forth in Figure 8, 
or a portion thereof. In an embodiment, the nucleic acid 
comprises sequence set forth in Figure 11, or a portion 
thereof. In another embodiment, the nucleic acid encoding 
protein comprises at least a portion of the amino acid 
sequence set forth in Figure 12, or a portion thereof. 

This invention also provides the above gene comprising 
sequence set forth in Figure 10, or a portion thereof. 

The invention furthermore provides the above isolated 
nucleic acid molecules, wherein the nucleic acid is RNA, 
30 cDNA, genomic DNA, or synthetic DNA. This invention also 
provides a vector comprising the above nucleic acid 
molecule. In an embodiment, the vector is designated as pBK- 
CMV-B4 comprising sequence set forth in Figure 6, or a 
portion thereof, and the corresponding CA12S protein 
comprising sequence set forth in Figure 5, or a portion 
thereof, in another embodiment, the vector is designated as 
PBKCMV-B30 comprising sequence set forth in Figure 7, or a 
portion thereof, and the corresponding CA125 protein 
comprising sequence set forth in Figure 8, or a portion 
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thereof. In yet another embodiment, the vector is designated 
as P CMV-Tag-B4 comprising sequence set forth in Figure 6, or 
a portion thereof, and the corresponding CA125 protein 
comprising sequence set forth in Figure 5, or a. portion 
5 thereof. In a further embodiment, the vector is designated 
as pCMV-Tag-B30 comprising sequence set forth in Figure 7, 
or a portion thereof, and the corresponding CA125 protein 
comprising sequence set forth in Figure 8, or a portion 
thereof . 
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This invention provides an expression system comprising the 
above vector. In an embodiment, the system is a eukaryotic 
or prokaryotic system. This invention further provides a 
method for producing CA125 protein comprising the above 
15 expression system. 

This invention further provides an isolated nucleic acid 
molecule comprising sequence capable of specifically 
hybridizing to the sequences above. In an embodiment, the 

20 nucleic acid molecule is capable of inhibiting the 
expression of the CA125 protein. A method of inhibiting 
expression of CA125 inside a cell by vector-directed 
expression of a short RNA able to hybridize with the 
protein-coding RNA of CA125. In another embodiment, the 

25 nucleic acid molecule is at least a 7mer. In another 
embodiment, it is at least a lOmer. In a separate 
embodiment, the nucleic acid molecule is at least a 20mer. 
In a further embodiment, the sequence is unique. 

30 This invention further provides a method to detect ovarian 
cancer in a subject comprising steps of: a) contacting the 
above isolated nucleic acid molecule with RNA from a sample 
from the subject under conditions permitting the formation 
of a hybrid complex, and b) detecting the hybrid complex, 

35 wherein a positive detection indicates the expression of the 
antigen and presence of cancer. 

Furthermore, this invention provides a method of monitoring 
ovarian cancer therapy in a subject comprising steps of: a) 
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contacting the above isolated nucleic acid molecule with RNA 
from a sample from the subject under conditions permitting 
the formation of a hybrid complex, and b) measuring the 
amount of the hybrid complex, wherein a decrease in the 
hybrid complex indicates the success of therapy. 

This invention also provides a method for inhibiting the 
expression of the CA125 protein comprising contacting an 
appropriate amount of the above nucleic acid molecule so 
that hybridization of the gene or transcript encoding the 
CA125 protein will occur, thereby inhibiting the expression 
of the protein. This invention further provides a 
composition comprising the above isolated nucleic acid 
molecule . 



In addition, this invention provides a vaccine for a cancer 
which expresses CA125 protein comprising an appropriate 
amount of the above isolated nucleic acid molecules. 

In a separate embodiment, this invention provides a vaccine 
for a cancer which expresses CA125 protein comprising an 
appropriate amount of the isolated nucleic acid molecules 
which, when expressed, are capable of producing a product 
which induces an immune response to CA125 protein. In an 
embodiment, the nucleic acid molecule comprises sequences 
encoding human CA125 protein or a portion thereof. 

In another embodiment, the expressed human sequence is 
linked to a carrier. It is known that a carrier can booster 
immune response. The said carrier may be a protein carrier. 

In yet another embodiment, the nucleic acid molecule 
comprises a nonhuman sequence. In a further embodiment, the 
nucleic acid molecule comprises a primate sequence, m an 
additional embodiment, the nucleic acid molecule comprises a • 
murine sequence. In a further embodiment, it comprises a rat 
or mouse sequence. In yet another embodiment, the nucleic 
acid molecule comprises a synthetic sequence, which, when 
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expressed, is capable of producing a product which induces 
an immune response to CA125 protein. 

in addition, this invention provides the vaccine wherein the 
5 sequence hybridizes with or is homologous to the sequences 
encoding human CA125 protein. In an embodiment, the vaccine 
further comprising a suitable adjuvant. In an embodiment, 
the adjuvant is an alum. In another embodiment, the cancer 
is an ovarian, pancreatic, breast, endometrial, or lung 
10 carcinoma. 

This invention also provides a method to treat a cancer 
which expresses CA125 in a subject comprising administering 
to the subject an appropriate amount of the above vaccine. 

This invention also provides the above method, wherein the 
cancer is an ovarian, pancreatic, breast, endometrial, or 
lung carcinoma. 

This invention further provides a vaccine for a cancer which 
expresses CA125 comprising an appropriate amount of the 
expressed CA125 protein corresponding to the above sequence. 

This invention also provides a vaccine for a cancer which 
expresses CA125 protein comprising an appropriate amount of 
a substance which induces an immune response to CA125 
protein. In an embodiment, the substance is a polypeptide or 
a peptide. In a separate embodiment, the polypeptide 
comprises sequences encoding human CA125 protein or a 
portion thereof. In yet another embodiment, the expressed 
human sequence is linked to a carrier. In a further 
embodiment, the polypeptide comprises a nonhuman sequence. 
In a separate embodiment, the polypeptide comprises a 
primate sequence. In another embodiment, the polypeptide 
comprises a murine sequence. In yet another embodiment, the 
polypeptide comprises a synthetic sequence, which, when 
expressed, is capable of producing a product which induces 
an immune response to CA125 protein. The production of a 
synthetic sequence or a hybrid of synthetic and natural 
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sequences is well-known in this field. In separate 
embodiment, the vaccine further comprising a suitable 
adjuvant. In an embodiment, the adjuvant is an alum. 

This invention provides the above vaccine, wherein the 
expressed protein is conjugated to a protein carrier to 
increase the immunogenicity . Furthermore, this invention 
provides the above vaccine, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung carcinoma. 

Furthermore, this invention provides a method to treat a 
cancer which expresses CA125 in a subject comprising 
administering to the subject an appropriate amount of the 
above vaccine. 

This invention also provides a method to prevent a cancer 
which expresses CA125 in a subject comprising administering 
to the subject an appropriate amount of the above vaccine. 
In an embodiment, the cancer is an ovarian, pancreatic, 
breast, endometrial, or lung carcinoma. 

In addition, this invention provides a method for the 
diagnosis of a cancer which expresses CA125 by detecting 
CA125-expressing cells in the blood or other fluids of 
patients based on the nucleic acid sequence which encodes 
CA125. 



This invention also provides a method for monitoring the 
therapy of a cancer which expresses CA125 by measuring the 

30 expression of CA125-expressing cells in the blood or other 
fluids of patients based on the nucleic acid sequence which 
encodes CA125, a decrease of either the number of CA125- 
expressing cells or level of protein expression in the cell, 
indicating the success of the therapy, m an embodiment, the 

35 detection is based on polymerase chain reaction with 
appropriate primers. 

This invention further provides a method of producing CA125 
protein comprising steps of: a) constructing a vector adapted 
40 for expression in a cell which comprises the regulatory 
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elements necessary for expression of nucleic acid in the 
cell operatively linked to the nucleic acid encoding the 
CA125 protein so as to permit expression thereof; b) placing 
the cells of step (a) under conditions allowing the 
5 expression of the CA125 protein; and c) recovering the CA125 
protein so expressed. In an embodiment, the cell type is 
selected from the group consisting of bacterial cells, yeast 
cells, insect cells, and mammalian cells. 

10 This invention also provides the CA125 protein expressed by the 
above method. This invention also provides a method for 
production of antibodies against CA125 protein using the 
protein. This invention also provides the antibodies produced 
by the above method. This invention also provides a method of 

15 diagnosis of cancer which expresses CA125 using the antibodies 
above/ A method for monitoring the therapy of cancer which 
expresses CA125 using the above antibodies. 

This invention further provides a method for determining the 
20 immunoreactive part of CA125 comprising contacting 

antibodies which are known to be reactive to CA125 with the 
' protein above. Furthermore, this invention provides a 

transgenic nonhuman organism comprising the above isolated 

nucleic acid' molecule. In an embodiment, the organism is a 
25 transgenic nonhuman mammal . 

This invention also provides a nonhuman organism, wherein 
the expression of CA125 is inhibited. In an embodiment, the 
organism is a nonhuman mammal. In a separate embodiment, the 
30 mammal is a mouse. 

Finally, this invention further provides a method for 
screening a compound for treatment of cancer which expresses 
CA125 protein comprising administering the compound to the 
35 transgenic nonhuman organism above, a decrease in expression 
of CA125 protein indicating that the compound may be useful 
for treatment of the cancer. In an embodiment, the cancer is 
an ovarian, pancreatic, breast, endometrial, or lung 
carcinoma. 
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The invention will be better understood by reference to the 
Experimental Details which follow, but those skilled in the 
art will readily appreciate that the specific experiments 
5 detailed are only illustrative, and are not meant to limit 
the invention as described herein, which is defined by the 
claims which follow thereafter. 
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CA125 is an ovarian cancer antigen that is basis for a 
widely-used serum assay for the monitoring of patients with 
ovarian cancer, however detailed information on its 
biochemical and molecular nature is lacking. The inventors 
now report the isolation of a long, but partial, cDNA that 
corresponds to the CA125 antigen. A rabbit polyclonal 
antibody produced to purified CA125 antigen was used to 
screen a XZAP cDNA library from OVCAR-3 cells in Escherichia 
coli. The longest insert from the 53 positive isolated 
clones had a 5965 b.p. sequence containing a stop codon and 
a poly A sequence but no clear 5' initiation sequence. The 
deduced amino acid sequence has many of the attributes of a 
mucin molecule and was designated CA125/MUC16. These 
features include a high serine, threonine, and proline 
content in an N-terminal region of nine partially conserved 
tandem repeats (156 amino acids each) and a C-terminal 
region non-tandem repeat sequence containing a possible 
transmembrane region and a potential tyrosine 
phosphorylation site. Northern blotting showed that the 
level of MUC16 mRNA correlated with the expression of CA125 
in a panel of cell lines. The molecular cloning of 
CA125/MUC16 antigen will lead to a better understanding of 
its role in ovarian cancer. 

EXPERIMENTAL DETAILS 

First Se ries of Experiments 

Materials and Methods 

NIH:OVCAR3 cell line was obtained from the American Type 
Culture Collection (Rockville, MD) . Anti-CA125 antibody mAb 
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OC125 was a generous gift from Dr. R. Bast, Jr. mAb VK-8, 
developed in the inventors' Laboratory by immunization of 
mice with human ovarian cancer cell line OVCAR-3, also 
identifies CA125 but reacts with a different epitope (s) than 
5 OC125 (15) . Tumor cell lines were from the Si can -Kettering 
Institute Cell Bank. 

Purification of CA125 Antigen 

CA125 was purified from the culture supernatant of 
10 NIH:OVCAR-3 cells in a simple two-step procedure (15). 
Briefly, the cells were cultured as a monolayer in a 
synthetic medium (ITS, Life Technologies, Grand Island, NY), 
in RPMI medium containing 1% fetal bovine serum (FBS) and 
the culture medium was harvested every 7 days. Medium from 
15 31 liters of supernatant medium was concentrated 10 fold and 
precipitated with perchloric acid (0.6 N final 
concentration). After centrifuging, the neutralized 

supernatant was passed through a column of normal mouse Ig- 
agarose (30 ml; 1.0 mg/ml) and then through a column of VK-8 
20 mAb (80 ml; 2.0 mg/ml). The antibodies were linked to 
Actigel ALD gel according to the manufacturer's directions 
(Sterogene Bioseparations, Inc., Carlsbad, CA) . The VK-8 
column was washed at 4° with PBS, then with 1M NaCl in PBS, 
and finally eluted with 3M MgCl 2 . Fractions (6.0 ml) were 
25 collected and assayed for CA125 antigen by ELISA with mAb 
VK-8 as described (15). Fractions from the MgCl 2 eluate 
containing CA125 reactivity were pooled and used in 
subsequent studies. Analysis by SDS-PAGE and silver 
staining (Fig. 1) showed that the sample consisted of very 
30 high molecular weight components migrating in the stacking 
gel and in a region just below the gel interface; all these 
species were reactive with mAb OC125 (data not shown) . The 
sample also contained a lower molecular weight species 
originating from the FBS used in the cell cultures. The 
amino acid content of the sample was determined as described 
previously (15) . 
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Production of a Rabbit Antiserum to CA125 Antigen 

The CA125 sample was further purified by preparative SDS- 
PAGE and the high molecular weight region of the gel 
indicated in Fig. 1 was excised. After homogenization in 
5 incomplete Freund's adjuvant the gel was used to immunize a 
rabbit (NZB white, female) by 3 subcutaneous injections, 1 
week apart, in 8 sites. Serum was obtained from the rabbit 
10 days after the final immunization. An aliquot (3.0 ml) 

of the serum was absorbed with a pellet of melanoma cells 
10 (SK-MEL-28, -23, -30 and -33; 6.7 ml) that had been treated 

with 0.2% NP40 and 0.1% protease inhibitor cocktail (Sigma 

Co., St. Louis, MO) and the absorbed serum was used to 

screen a cDNA library. 

15 Screening of OVCAR-3 cDNA Library 

A cDNA library. was constructed from OVCAR-3 mRNA in the A, ZAP 
Express vector in E . coli as described by the manufacturer 
(Stratagene, La Jolla, CA) . The library contained 7.5 X 10 6 
P.f.u. The library was plated onto 15 plates at 

20 approximately 30,000 pfu/150 mm plate and plaques were 
transferred to nitrocellulose and screened with the absorbed 
rabbit antiserum (1:500). Positive plaques were identified 
using anti-rabbit Ig- horseradish peroxidase conjugate 
(Southern Biotechnology Assoc., Birmingham, AL) and 4- 

25 chloro-l-napthol reagent. After subcloning three times and 
retesting with antiserum, 54 positive clones remained. 
These clones contained inserts ranging from 1.5 to >4.0 kbp 
and were designated pBK-CMV-Bl to B54. 

DNA Sequencing and Sequence Analysis 

The nucleotide sequence of the longest insert (B4) was 
determined using Big Dye terminators (PE Biosystems) and run 
on ABI 3700 or ABI 377 DNA sequencer by the Cornell 
University BioResource Center, Ithaca, NY. Using the T3 
primer and then a series of internal sequencing primers, 
corresponding to less conserved regions of the gene, a 5965 
bp sequence was identified in B4. Partial sequencing of the 
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other inserts demonstrated that the majority corresponded to 
different parts of the B4 sequence. 

Northern Blot Analysis 

5 mRNA was isolated from a panel of human tumor cell lines, 
which had been serologically typed for CA125 expression, 
using an mRNA Isolation System kit (Invitrogen, Carlsbad, 
CA) . mRNA samples (3 :g) were denatured with formaldehyde, 
separated by electrophoresis in 1.0% agarose and transferred 
10 to nylon sheets (Gene Screen Plus, NEN, Boston, MA) . The 
blot was hybridized with a biot in- labeled probe from an 
insert containing 3 tandem repeat regions (B53) using .a 
chemiluminescence procedure following the manufacturer's 
directions (Renaissance reagent; NEN, Boston, MA). 

15 

Serological Analysis 

Tumor cell lines were assayed for CA125 expression with mAb 
OC125 and VK-8 using a red cell resetting method as 
described previously (15) . 
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RESULTS 

Cloning of CA125/MOC16 cDNA 

Although most studies on the molecular cloning of mucins 
utilized polyclonal antisera raised to the deglycosylated 
mucin (apomucin) , in this study we used a rabbit antiserum 
prepared against the native CA125 antigen. CA125 was 
purified by affinity chromatography on an anti-CA125 
antibody (mAb VK-8) column by elution under mild conditions 
with a chaotropic ion (3M MgCl 2 ) as described previously 
(15). The purified sample had an amino acid composition 
similar to that found in other mucins (Table 1) and 
extremely high CA125 activity (2 X 10 s units/mg protein) . To 
immunize rabbits the preparation was further purified by 
SDS-PAGE and gel slices containing high molecular weight 
CA125 antigen (Pig. D were used as the immunogen (in 
incomplete Freund's adjuvant). The resulting antiserum was 
absorbed with a pellet of non-ovarian cancer cells, after 
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partially solubilizing the cells in 0.2% NP-40, to remove 
non-specific antibodies. 

Table 1. Comparison of Amino Acid Content of Purified CA125 and Deduced 
5 Amino Acid Composition of CA125/MUC16 and Its Tandem Repeat 

Region 



Amino Acid Purified CA125/ CA125/ 
CA125 MUC16 MUC16(TR) 

moles % moles % moles % 



Asn 


8.5 


8.9 


8.1 


Glx 


7.8 


8.1 


7.5 


Ser 


11.0 


8.7 


8.9 


Gly 


9.0 


7.4 


7.6 


His 


2.6 


2.8 




Arg 


4.6 


5.9 


6.3 


Thr 


12.4 


11.6 


12.7 


Ala 


3.8 


3.1 


2.9 


Pro 


8.7 


8.1 


9.0 


Tyr 


2.6 


3.8 


3.3 


Val 


5.2 


5.0 


4.7 


Met 


1.2 


1.1 


1.0 


Cys 




1.4 


1.2 


Iso 


2.7 


3.3 


3.1 


Leu 


12.4 


13.4 


13.7 


Phe 


3.7 


3.9 


3.6 


Lys 


3.8 


3.0 


2.9 
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The absorbed antiserum was used to screen a *.ZAP cDNA 
library from OVCAR-3 cells expressed in E. coli. Fifty-four 
positive clones were detected and 53 inserts were sequenced. 
Initial sequencing of the longest clone (B4) showed that it 
had 9 partially conserved repeats of 495 b.p. each and a 
short non-repetitive 3' region. Further sequencing with 
internal primers extended the 3' end of the sequence to 
include a stop codon, a polyadenylation signal and a poly A 
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region for a total of 5965 b.p. (Fig. 2)- *° <* ear 
initiation sequence (ATG in a Kozak box) was detected at the 
5'-end, indicating that the derived sequence is incomplete. 
The majority of the other inserts (B1-B53) had sequences 
5 derived from different parts of the B4 sequence. No clones 
containing only 3' non-repetitive sequences were identified. 
Searching GenBank™ revealed no related full-length cDNA but 
numerous related human ESTs (including Accession Numbers: 
AI566650, AI537678, AI276341, AI923224, AI276341, AU158364, 
10 AU140211, AK024365) and one mouse EST (AK003577) were 
detected. With minor exceptions, these sequences were 
identical .to those derived for B4. The nucleotide sequence 
of B4 was designated CA125/MUC16. 

15 Chromosomal Location of CA125/MUC13 Sequences 

Comparison of the B4 sequence with the working draft version 
of the human genome, available from the NCBI, located 
homologous sequences on chromosome 19 ( P 13.3 region). As 
sequencing of this region is incomplete and presently 
20 consists of numerous unordered segments of varying lengths, 
more complete genomic information must await the 
availability of further sequencing data. 

Analysis of the Deduced Amino Acid Sequence of CA125/MUC16 

25 . The nucleotide was conceptually translated into an amino acid 
sequence assuming initiation at the ATG of the p-galactosidase 
gene in the vector. The deduced amino acid sequence of 1890 
amino acids (Fig. 3) suggested a mucin-type molecule. It had 
an amino acid composition that was moderately high in serine 
30 (8.9%), threonine (12.5%) and proline (8.8%); this composition 
is very similar to that of the purified CA125 sample used in 
this study (Table 1), although the proportion of these three 
amino acids is lower than in most other mucins. The sequence 
contained a large region of 9 tandem repeats (TR) of 165 amino 
35 acids each and a C-terminal non- repetitive region of 537 amino 
acids. None of the 9 repeats are identical but numerous 
perfectly conserved residues and short sequences are apparent 
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(Pig. 3) . Two conserved cysteine residues within the TRs are 
notable. The serine and threonine residues are scattered 
throughout the sequence but the TR regions have prominent 
clusters of Ser and Thr, often with adjacent Pro residues which 
5 is a common feature of O-glycosylation sites (19), e.g. 
SSVPTTSTP (47-55 and 671-679) and SSVSTTSTTSTP (1139-1147). 
These characteristics are typical of mucins. The high Leu 
content of this sequence is, however, not found in other cloned 
mucins. Other features of interest include a sequence of 
hydrophobic amino acids (25 residues) towards the C-terminal 
end (presumably representing a transmembrane region) and a 
short 31- amino-acid cytoplasmic tail. This region also 
contains a consensus tyrosine phosphorylation site (RRKKEGEY; 
refs. 20, 21). Numerous potential N-linked glycosylation sites 
15 occur in both the TR and non-TR regions (Fig. 3) . 

Northern Blotting 

mRNA from a panel of ten CA125 + and CA125' cell lines was 
screened with a probe derived from the tandem repeat region 
20 of MUC16. Three of the cell lines gave positive blots and 7 
were unreactive (Fig. 4) . The polydisperse pattern obtained 
is typical of that observed with other mucin mRNAs. These 
data corresponded to the expression of CA125 antigen on the 
cell lines as determined by serological analysis with 
antibodies to CA125 (mAbs 0C125 and VK-8) . The strongest 
signal was given by mRNA from OVCAR-3 (lane 5), the cell 
line from which the CA125 was purified and the cDNA library 
was produced. 
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Peptide Sequences Derived from CA125 Antigen 

Purified CA125 was deglycosylated by treatment with 
anhydrous HF at room temperature for 3 hrs (22) . Two 
sequences were obtained from a tryptic digest of the HF- 
treated sample after SDS-PAGE and transfer of the 25-35 kDa 
region to a nitrocellulose membrane (22) . The product was 
also digested with Lys-C in guanidinium hydrochloride; 
peptides were isolated by microbore HPLC, and four peptides 
were successfully sequenced (Table 2) . Five of these 
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peptides corresponded to sequences within the TR and one to 
a sequence in the C-terminal region of the deduced MUC16 
sequence (Table 2) . 

5 Table 2. Amino Acid Sequences Derived from Purified CA125 



Sequence Position in CA125/MUC16 sequences 



10 



15 



20 



B y T.ys-C digestion 

AQPGTTNYQRNK 1722-1733 
SPRLDR 1098-1113 
pLFK 120-123, and other locations 

pGL 7-9 and other locations 

B y trypsin digestion 

KAQPGTTNY QRN 1721-1732 
RTPDTSTMHLATSRT 833-847 

EXPRESSION ANALYSIS OF CA125 NUCLEOTIDE CLONE (FIG. 9) 

This figure is the result of an expression experiment that 
confirms that the sequence actually codes for CA125, as 
recognized by standard antibodies. 

Method 

Clone B53 (in pCMV-tag vector) was transfected into SK-OV-3 
(CA125-negative cell line) with Lipof ectamine Plus reagent. 
Stable clones were selected with neomycin. Cells were 
radiolabeled with 3 H glucosamine, immunoprecipitated with 
antibodies and the products analyzed by SDS-PAGE and 
30 autoradiography . 

Result 

Lane 1 (mAb OC125) and lane 2 (mAb VK-8) have bands at the 
top of the gel showing the presence of CA125 antigen in the 
transfected cells. No bands were obtained with normal mouse 
serum (negative control) . 

This result proves that the cloned nucleotide sequence 
contains the information for coding for the CA125 antigen. 
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DISCUSSION 

Based on the following evidence, the cloned MUC16 sequence 
is a strong candidate for being the cDNA for the peptide 
core of the CA125 antigen: (i) the CA125 antigen used in 
5 the study was isolated by affinity chromatography on an 
anti-CA125 monoclonal antibody column and was highly 
purified, (ii) peptides isolated from the purified CA125 
sample corresponded to sequences in the cloned MUC16 
sequence and (iii) MUC16 mRNA levels in a panel of cancer 
cell lines, as determined by Northern blotting, correlated 
with the expression of CA125 in the cell lines as determined 
serologically. Moreover, this result supports earlier 
biochemical studies that had concluded that CA125 antigen is 
a mucin-type molecule (15). The cloned sequence is 
therefore designated as CA125/MUC16. This gene has been 
provisionally localized to chromosome 19pl3.3. Initially 
reported sequences of mucins are rarely full length because 
of the extremely large size of mucin mRNAs and not 
unexpectedly, no apparent 5' initiation signal is evident in 
the CA125/MUC16 cDNA sequence. The sequence is believed to 
be complete at the 3'-end as a stop codon, a polyadenylation 
site and a poly A tail have been identified (Pig. 2) . 
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Mucins are notoriously difficult to clone because of their 
complex structure and high degree of glycosylation. Most 
successful cloning efforts have resulted from screening cDNA 
libraries with a polyclonal antiserum produced- to the 
deglycosylated mucin (reviewed in 23-27) . Thirteen human 
mucins have been cloned or partially cloned to date (MUC-1, 
30 -2, -3, -4, -SAC, -5B, -6, -7, -8, -9, -11, - 12 and -13; 
refs. 23-29). in this study, however, a polyclonal 
antiserum to the native mucin was used to isolate a cDNA 
corresponding to the peptide moiety of CA125/MUC16 antigen. 
This approach may have been successful because of the 
relatively low content of serine and threonine (representing 
potential O-glycosylation sites) in CA125/MUC16 in 
comparison with most other mucins. The high degree of 
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purity of the isolated antigen, as well as the use of a 
highly absorbed antiserum and the high expression of CA125 
in the OVCAR-3 cell line used to produce the cDNA library, 
may also have been key factors in obtaining positive clones. 

5 The deduced amino acid sequence of CA125/MUC16 resembles 
other mucins in having serine, threonine and proline as 
major amino acids; however, its high content of leucine is 
characteristic of MUC16. The presence of tandem repeats is 

10 also typical of mucins but the length of the repeat units 
(156 amino acids) is unusual, with only MUC6 having longer 
tandem repeats (30). Nine TRs have been identified thus 
far, with the last repeat being shorter than the others. 
The amino acid sequences in the TRs are not perfectly 

15 conserved, although 81 positions have conserved amino acids 
and certain motifs e.g. GPLYSCRLTLLR , ELGPYTL , FTLNFTIXNL 
and PGSRKFNXT, are found in all or most of the TRs. Two 
closely spaced cysteine residues (20 amino acids apart), 
which could form interchain disulfide bonded loops in the 

20 structure, are also perfectly conserved. 

Serine and threonine residues, representing potential O- 
glycosylation sites, are scattered throughout the sequence 
but blocks of clustered Ser and Thr residues are evident in 
25 the TR region. These regions have adjacent or nearby Pro 
residues - a motif that is frequently found in O- 
glycosylation sites (19). One short serine/threonine-rich 
sequence (PTSSSST) is also found in the C-terminal non-TR 
region. Numerous potential N-glycosylation sites (Asn-X- 
30 Ser/Thr, where X is any amino acid except Pro) are also 
found in the sequence, including two that are perfectly 
conserved in the TR region. It is unlikely, however, that 
many of these sites are used as the content of N-linked 
' glycan chains in purified CA125 is very low (15). It is 
35 also interesting to note that the sequence contains numerous 
lysine and arginine residues that are remote from the 
postulated O-glycosylation sites and which could explain the 
sensitivity of CA125 to trypsin digestion (16) . Searching 
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for conserved domains in the NCBI Blast site revealed the 
presence of six SEA domains in the deduced protein sequence. 
The significance of this finding is unclear. Five of the 
domains are in the tandem repeat region and one is in the 
5 non-tandem repeat region (amino acids 1709-1768) . SEA 
domains were originally described as being characteristic of 
membrane -bound proteins with high levels of O-glycosylation 
(31); CA125/MUC16 certainly fits this description. 
Recently, it has been suggested that they also designate 
regions susceptible to proteolytic cleavage (32) . 
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Two features of the non-TR region are particularly 
interesting. First, is the presence of a 25 -amino- acid 
block of hydrophobic amino acids which could represent a 
15 membrane -spanning region. Transmembrane (TM) motifs have 
been found in five other mucins (MUC-1, -3, -4, -12 and 13) . 
The remainder of the mucins that have been cloned lack TM 
regions and instead have cysteine-rich regions with homology 
to van Willebrand factor (27) . Members of this family of 
mucins are secreted and form gels that protect and lubricate 
epithelial tissues. CA125 is also secreted from ovarian 
tumors and cell lines but the mechanism for its secretion is 
unclear. Two possibilities can be suggested - (i) a 
proteolytic event, possibly in the C- terminal SEA domain, 
cleaves off the luminal N- terminal domain (as in MUCl, refs. 
33, 34) or (ii) alternatively- spliced mRNAs are generated 
that lack the TM region. Indeed, recent sequencing of 
clones B30 and B22 indicates the existence of such sequences 
(data not shown) . The second feature of interest in the 
non-TR sequence is a short cytoplasmic tail (31 amino acid) 
that contains a putative tyrosine phosphorylation site 
(RRKKEGEY). This sequence is conserved in the translated 
mouse EST (AK003577) that has homology with CA125/MUC16 at 
the C-terminal end. MUC-1 has several tyrosine residues in 
35 its cytoplasmic tail and at least one of these is 
phosphorylated in vivo (35, 36). One of the Tyr residues in 
MUCl occurs in a YTKTP sequence, a motif that is responsible 
for binding to SH2 domains in proteins involved in 
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intracellular signaling. The putative phosphorylation site 
found in CA125/MUC16 was first recognized . in src family 
proteins (19, 20) . Whether or not this tyrosine residue is 
phosphorylated in CA125 antigen is not known. Fendrick et 
5 al (37) reported the presence of phosphate in CA125 from 
WISH cells by labeling with 32 P0 4 = and immunoprecipitation 
analysis but concluded that the phosphorylation site(s) are 
on Ser or Thr. Significantly, however, the secretion of 
CA125 is stimulated by epidermal growth factor (EGF) , 
10 presumably through the EGF receptor which is a well-known 
tyrosine kinase (37) . The possibility that CA125/MUC16 is 
phosphorylated on tyrosine and is involved in intracellular 
signaling needs further investigation. Interestingly, no 
EGF domains, which are found in some other mucins (MUC3 , 
15 MUC4, MUC12 and 13), were located in CA125 (MUC16) . 

The molecular cloning of CA125 antigen opens the way to a 
better understanding of this important antigen, including 
its physiological function and its role in the biology of 
ovarian cancer. Of immediate interest will be the 
identification of the epitope (s) recognized by the various 
monoclonal antibodies that recognize CA125 (38). The 
identification of tandem repeats in the MUC16/CA125 
structure is consistent with the use of a single monoclonal 
antibody in double-determinant assays for CA125 levels, 
which would indicate that the antigen has multiple,, 
identical epitopes (2). Such studies could lead to 
improvements in the CA125 assay for the detection of ovarian 
cancer . 
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Second Series Of Experiments 

Identification of a form of the CA125 ovarian cancer antigen 
(MUC16B) lacking a transmembrane sequence 

5 CA125 antigen is overexpressed in the majority of human 
ovarian carcinomas and is released into the blood stream 
where it can be detected with suitable immunological assays 
(1) . Approximately 80% of patients with ovarian cancer have 

10 elevated serum CA125 levels and the measurement of these 
levels is a valuable tool for monitoring the clinical status 
of ovarian cancer patients (2,3). 

Despite the widespread use of CA125 as a serum marker, until 
15 recently, very little information was available on the 
molecular nature of the CA125 antigen. Biochemical studies 
had indicated that the antigen is a large, highly 
glycosylated glycoprotein with mucin-like characteristics 
(4-6) . This suggestion has now been confirmed by the 
20 molecular cloning of CA125 (gene designation: MUC16) by the 
inventors (7,8) and O'Brien and coworkers (9). Both groups 
reported a long DNA species that coded for a protein with a 
large number of partially-conserved, 156 amino acid-long 
tandem repeat (TR) sequences. These tandem repeats contain a 
25 serine, threonine and proline-rich (S/T-rich) area that is a 
potential region of O-glycosylation. The molecule also 
contains a C-terminal non-TR region, a potential membrane- 
spanning sequence and a short cytoplasmic tail. O'Brien et 
al. (9) also reported a large N-terminal non-repetitive 
30 S/T/P-rich region in CA125. 

The presence of a membrane- spanning region in MUC16/CA125 
raises the question as to the source of serum CA125 antigen. 
One possibility is that cell-bound CA125 is cleaved by a 

35 protease (s) and released into the surrounding medium. In 
support of this mechanism is the presence in the molecule of 
SEA motifs which are possible protease-sensitive sites 
(7,9). Another, not mutually exclusive, explanation is that 
MUC16/CA125 is also synthesised as a form lacking a trans- 

40 membrane region that could be directly secreted from cells. 
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During the original cloning of MUC16/CA125 we had isolated a 
small number of cDNA clones that appeared to differ from the 
reported clone (B4) in having a different 3' nucleotide 
sequence. We now show that these species represent a second 
5 form of MUC16/CA125 lacking a C-terminal membrane -spanning 
region that could be a secreted form of the antigen. This 
species (gene designation: MUC16B) also has a long 
serine/threonine-rich N-terminal sequence. 

10 EXPERIMENTAL PROCEDURES 
Materials and Methods 

The isolation of cDNA clones B4, B30 and B22 in the pBK-CMV 
vector has been described (7). Human tumor cell lines 
OVCAR3, SK-OV-8, COL0316, 2774, SK-OV-3 and SK-OV-8 (ovarian 
15 cancer cell lines), MCF-7 (breast cancer), IMR-32 
(neuroblastoma) , MKN45 (gastric cancer) , and MCA (sarcoma) 
and their CA125 status have been described (7) . 
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RT-PCR procedure and cDNA sequencing 

Messenger RNA was isolated from cell pellets using a 
FastTrack 2.0 kit (Invitrogen Life Technologies, Carlsbad, 
CA) . cDNA was then synthesised using a Superscript First 
Strand Synthesis kit as described by the manufacturer 
(Invitrogen) . RT-PCR was performed as follows: 2jil cDNA, 
0.2mM dNTP mix, 4mM MgC12, 0.4 to l|iM forward or reverse 
primers and 2.5U Platinum Taq DNA Polymerae (Invitrogen) 
were mixed in a total volume of 50(al and the samples were 
cycled as follows: 94° for l min., 25-35 cycles of 94°C for 
30 sees, 54-65°C for 30secs and 72°C for 30 sees to 3 min. 
and a final cycle of 94°C fro 5 min. For the PCR of longer 
products (> 5 kb) the LA PCR kit from Takara Sfuko Co. was 
used under following conditions: 94°C for 1 min., followed by 
30 cycles of 94°C for 20 sees., 60°C for 30 sees and 72°C for 
7 Or 10 min. and a final cycle of 94°C for 20 sees., 55 or 
35 60°C for 30 sees., and 72°C for 10 min. RT-PCR products were 
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analyzed by gel electrophoresis in 0.8 or 1.0% agarose in 
Tris-acetate -EDTA and stained with ethidium bromide. 

For sequencing the PCR product was cloned into the Topo TA 
5 cloning vector from Invitrogen) . Inserts were sequenced 
initially with T3 and T7 primers and then with suitable 
forward and reverse primers designed according to the 
derived sequence. Sequencing was performed either by our own 
sequencing facility or by the Cornell University Facility 
10 using a BigDye Terminator Primer Sequencing Kit (Perkin 
Elmer/ABI) in ABI 3700 or ABI 377 DNA seqenators . The 
sequences were aligned visually for the repeat region, 
sequences and with the aid of Vector NT for other sequences. 

15 3' and 5' RACE procedures 

These procedures were performed with the First Choice RLM- 
RACE kit (Atnbion Co., Austin TX) using suitable forward 
primers for the 3' and reverse primers for 5' region 
respectively. For the 5- RACE the outer gene-specific primer 
20 ' was 5 1 TCACAGTCCCTACATTGACTA3 1 and the inner primer was 
5 1 CATGGCACATCTCCAGGGT3 ' . The products were cloned into TA 
vector and sequenced as described above. 
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RESULTS 

Cloning and sequencing of B30 cDNA 

During the original expression cloning of MUC16 (7) we 
observed that the majority of the clones detected by 
screening the cDNA library with a rabbit antiserum were 
shorter forms of the longest clone (B4) reported (7) and 
contained varying numbers of TRs, a non-TR region, a 
potential TM region and a cytoplasmic tail. However a few 
clones were isolated that appeared to be different in that 
they lacked a restriction enzyme site (Xho) present in the 
B4 family of inserts. The cDNA from one of these clones 
(B30) was completely sequenced using the T3 primer of the 
vector initially and, subsequently, new forward and reverse 
primers derived from the less conserved regions of the new 
sequence. The B30 insert had a total of 4103 bp with a stop 



10 



30 



35 



WO 02/092836 PCT/US02/14768 

31 

codon at 3593 bp. This was followed by 3- non- translated 
region and finally, a poly A sequence. Despite the presence 
of a poly-A sequence no obvious polyadenylation site was 
observed (Fig. 10). clone B22 was partially sequenced and 
5 shown to be a shorter (2432 bp) form identical to the 3- 
sequence of B30. 

Conceptual translation of the B30 sequence indicated a 
protein composed entirely of 7.7 TRs of 156 amino acids 
each. The 4.5 C-terminal repeats were identical to sequences 
found in the B4 clone and three new partially- conserved TRs 
were detected N- terminal to the B4 sequence. The new repeats 
contained the potential cysteine loop, the 2 conserved N- 
glycosylation sites and the serine/threonine-rich region 
15 found in clone B4 of MUC16. No non-TR, transmembrane or 
cytoplasmic sequences were present in this new species of 
MUC16. Searching the NCBI database with this sequence 
yielded two EST (BE005912 and BI016218) corresponding to 
repeat number 3 in the B30 sequence. Surprisingly, no ESTs, 
or even genomic, sequences corresponding to the non- 
translated 3' region of B30 were detected in the NCBI 
databases. In order to confirm that the new form of MUC16 
was not a cloning artifact 3' RACE was performed with RNA 
from the OVCAR3 cell line. Sequences corresponding to the 
last repeat and the untranslated region were identified 
(data not shown) . We also examined a panel of cancer cells 
for transcripts corresponding to the 3- region by RT-PCR 
using primers from repeat 8 and the 3' end of the 
untranslated region of B30. PCR products were found only 
with mRNA from cells known to express CA125, again 
confirming the relationship of B30 to CA125. 

Complete sequence of MUC16B/CA125 

Searching the NCBI genomic database with sequences derived 
from B30 indicated that numerous sequences related to this 
species were located on a genomic sequence file designated 
NT 025133.6 (Pig. 13). At present (March 2002), this region, 
located on chromosome 19 P 13.3/pl3.2, consists of 53 
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unordered sequences of varying length. This data does not 
allow the complete sequence of MUC16 to be easily assembled, 
however by designing suitable RT-PCR primers from the 
genomic sequence for RT-PCR it was possible to amplify and 
5 sequence cDNA that extended the B30 by 6.5 partially 
conserved tandem repeat units (Figs. 11 and 12). This 
results in the identification of a total of 14 repeats in 
the new MUC16 sequence. Adjacent to the first exon of the 
5 '-most repeat sequence in NT 025133.6 we noticed a very 
10 long potential open reading frame. This region does not 
contain any repeat sequences but is rich in serine, 
threonine and proline residues. Also, in NT 025133.6 we 
observed a short putative exon containing the ATG sequence 
suggested by O'Brien et al. (9) to be the initiating codon 
15 of CA125 (Fig. 13) . Again by designing suitable primers in 
this region, PCR products corresponding to this new 5' 
region were cloned and sequenced. The NCBI database contains 
ESTs corresponding to portions of the 5' region of this 
sequence (AK056791, AK056791 and AF41442) . One of these ESTs 
20 extended into the 5' region beyond the ATG designated by 
O'Brien et al. (9). In fact NT 025133.6 contains an 
extremely long potential open reading frame (positions 
176,04,53-179,693) corresponding to this region. The Celera 
public access database also contains genomic sequence for 
25 this region and, significantly, has an extremely long 
hypothetical transcript sequence (hCT1645865) containing all 
the putative exons in 176,053-179,693 and 139,330-158,760 
b.p. regions of NT 025133.6. Primers were also designed to 
sequence these regions and by application of RT-PCR to 
30 OVCAR-3 mRNA it was possible to confirm these sequences. 
Only minor differences between the experimentally-derived 
sequence and the data base sequences except for numerous 
differences in the. 3' region of the serine/ threonine -rich 
were it joins the tandem repeat region between the published 
35 data and our sequence. This long S/T/P-rich coding region 
has numerous ATG codons which could serve as initiation 
sites for mRNA synthesis (some of them fitting a Kozak 
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consensus motif, ref. 10) was difficult to pick a likely 
site. Application of 5 ! RACE with a series of primers in 
different locations in the sequence finally yielded a primer 
that gave a clear cDNA product and sequencing of this 
5 product indicated a start site at position 261 (Pigs. 11 and 
12) . This ATG is located in a classical Kozak box. To 
confirm that the 5' S/T/P-coding region was in fact related 
to the tandem repeat region and codes for the CA125 antigen 
we performed RT-PCR on mRNA from a panel of cell lines (as 
10 we had done for the 3' end) with primers corresponding to a 
sequence close to the 5' end; the result showed a complete 
correlation between generation of the bp product and 
expression of CA125 in these cell lines. 

15 Conceptual translation of the assembled nucleotide sequence 
(18405 bp) demonstrated a protein of 5851 amino acids with 
an extremely long (3650 amino acids) S/T/P-rich C-terminal 
(containing 17.2% serine, 19.5% threonine and 9.0% proline) 
followed by a region of 14 partially-conserved repeats of 
20 156 amino acids each as described above (Fig. 12) . The 
sequence terminated after one of the S/T/P-rich regions in 
the last TR with no hydrophobic C-terminal transmembrane 
region being observed. 

25 DISCUSSION 

Using a combination . of expression cloning and RT-PCR 
approaches we have identified a new species of CA125 
(designated MUC16B) that has a long serine/threonine-rich N- 
terminal region and a C-terminal region of 14 tandem repeats 

30 but no apparent transmembrane region. This product could 
therefore be a secreted form of CA125 although no secretory 
peptide sequence is present at the N- terminus. The tandem 
repeat region is similar in construction to the repeats 
previously observed in MUC16/CA125. These repeats contain a 

35 small region rich in serine and threonine which could 
represent O-glycosylation sites. The N-terminal region has 
numerous serine and threonine residues scattered through the 
sequence and these could also be O-glycosylated. CA125 is 
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' known to be highly glycosylated (77 % by weight) and most of 
this consists of O-glycosylated chains (4) . Two conserved 
potential N-glycosylated sites occur in each tandem repeat 
and these could also contribute to the carbohydrate content 
5 of CA125, although this level is probably quite low (4) . 

At present it is unclear as to whether the CA125 molecules 
identified by the inventors (7,8) and O'Brien et al. (9) have 
the same long N- terminal sequence. O'Brien et al. (9) 
described a N-terminal sequence of 1638 amino acids in contrast 
to the xxx amino acids described here for MUC16B. However, the 
S/T/P-rich region was connected to the TR regions and the non- 
TR, trans-membrane and cytoplasmic regions similar to those 
reported by us in MUC16/CA125. Using 5' RACE they detected an 
15 initiating , methionine (at position 6435 in Fig. 11) whereas we 
could detect such a site only at position 262. Also unclear is 
whether either of the N-terminal S/T/P-rich sequences are 
present in the MUC16/CA125 species reported previously as clone. 
B4 was not complete at the 5' end (7). We were unable to 
20 generate products by performing RT-PCR with primers located in 
MUC16B repeat region and in the 3' portion of the MUC16 tandem 
repeats not found in MUC16B , indicating that MUC16 and MUC16B 
have different repeat sequences at their 5 '-end and possibly, 
therefore, a shorter or different S/T-rich regions.. Such a 
25 situation, may account for the larger number of repeats that 
were identified by O'Brien et al. (9) and those that can be 
found in the genome data bases and not in MUC16B. 

MUC16B/CA125 is an extremely long molecule with a peptide 
30 chain of 5851 amino acids and an Mr of about 600,000. Many 
other cloned mucins (11,12) also have extremely long peptide 
sequences, e. g. MUC5B has 5662 amino acids and a Mr of 
about 600,000 (13). By pulse-chase experiments we had 
previously identified a putative CA125 precursor species of 
35 about 400 kDa which, given the uncertainties inherent in 
very high molecular sizes determined by SDS-PAGE, is 
consistent with this result (5) . It is also interesting to 
note that the precursor consisted of a doublet of two 
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closely- spaced species on SDS-PAGE which could correspond to 
MUC16 and MUC16B (5) . 

Although MUC16B/CA125 has many of the attributes expected of 
5 a mucin species (i.e. large size, high serine, threonine and 
proline content, high level of O-glycosylation and presence 
of tandem repeats) it also has some unique features. These 
include the presence of potential cysteine loops in the 
repeat region and the segregation of the O-glycosylation 
sites into a small region of each repeat. Another unusual 
feature is that the repeat region is not coded by one long 
exon; rather each repeat unit contains 5 small exons 
[O'Brien et al. (9) and our unreported data]. In CA125 the 
longest exons are found at the 5' end and code for a non- 
repeat serine/ threonine-rich region. Because of it large 
size CA125 is extremely difficult to isolate in an intact 
form from biological materials. In our original purification 
of CA125 we described an extremely large species migrating 
in the stacking gel of a SDS-PAGE gel (4), whereas 
subsequently we found smaller species migrating mainly in 
the upper region of the separating gel (7) . Recently, in a 
report from the Third ISOBM Workshop (14) it was reported 
that CA125 can be degraded by sonication procedures, as well 
as by proteolytic digestion. 
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Another feature of CA125 that still needs to be completely 
elucidated is the location in the molecule of the antibody- 
detected epitopes. Presently available data indicated that 
they are mainly located in the tandem repeat regions of the 
molecule (8, 9) and this would be consistent with the ability 
of a single antibody to useful in sandwich assays (l) . 
Further work on this problem will be needed to further 
delineate the structures of the epitopes and whether more 
specific assays for CA125 can be devised. The molecular 
35 cloning of CA125 also opens up approaches to determining the 
function of CA125 and an understanding of its role in 
ovarian malignancy. 
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1. An isolated nucleic acid molecule comprising sequences 
encoding the CA125 protein or a portion thereof. 

2. The gene encoding the CA125 protein. 

3. The isolated nucleic acid molecule of claim 1 
comprising sequence set forth in Figure 6 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 5. 

4. The isolated nucleic acid molecule of claim 1 
comprising sequence set forth in Figure 7 and the 
corresponding CA125 protein sequence set forth in 
Figure 8. 

5. The nucleic acid of claim 1 comprising sequence set 
forth in Figure 11. 

6. The nucleic acid of claim 1 encoding protein comprising 
at least a portion of the amino acid sequence set forth 
in Figure 12. 

25 7. The gene of claim 2 comprising sequence set forth in 
Figure 10. 

8. The isolated nucleic acid molecules of claim 1, 2, 3, 
4, 5, 6, or 7, wherein the nucleic acid is RNA, cDNA, 

30 genomic DNA, or synthetic DNA. 

9. A vector comprising the nucleic acid molecule of claim 
1, 2, 3, 4, 5, 6, 7, or 8. 

35 10. The vector of claim 9, designated as pBK-CMV-B4 
comprising sequence set forth in Figure 6 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 5. 
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11. The vector of claim 9, designated as pBKCMV-B30 
comprising sequence set forth in Figure 7 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 8. 

5 12. The vector of claim 9, designated as pCMV-Tag-B4 
comprising sequence set forth in Figure 6 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 5. 

13. The vector of claim 9, designated as pCMV-Tag-B30 
comprising sequence set forth in Figure 7 and the 
corresponding CA125 protein comprising sequence set 
forth in Figure 8. 



10 



15 



20 



14. An expression system comprising the vector of claim 9. 

15. The expression system of claim 14, wherein the system 
is a eukaryotic or prokaryotic system. 

16. A method for producing CA125 protein comprising the 
expression system of claim 14. 

17. An isolated nucleic acid molecule comprising sequence 
25 capable of specifically hybridizing to the sequences of 

claim 1 or 2. 

18. The nucleic acid molecule of claim 17 capable of 
inhibiting the expression of the CA125 protein. 

30 19. A method of inhibiting expression of CA125 inside a 
cell by vector-directed expression of an RNA able to 
hybridize with the RNA of CA125 . 

35 20. The nucleic acid molecule of claim 17 or 18 which is at 
least a lOmer. 

21. The nucleic acid molecule of claim 17 or 18 which is at 
least a 20mer. 

40 
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22. A method to detect ovarian cancer in a subject 
comprising steps of: 

a) contacting the isolated nucleic acid molecule of 
5 claim 17 with RNA from a sample from the subject 

under conditions permitting the formation of a 
hybrid complex, and 
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b) 



detecting the hybrid complex, wherein a positive 
detection indicates the expression of the antigen 
and presence of cancer. 

A method of monitoring ovarian cancer therapy in a 
subject comprising steps of: 

a) contacting the isolated nucleic acid molecule of 
claim 17 with RNA from a sample from the subject 
under conditions permitting the formation of a 
hybrid complex, and 

b) measuring the amount of the hybrid complex, 
wherein a decrease in the hybrid complex indicates 
the success of therapy. 

A method for inhibiting the expression of the CA125 
protein comprising contacting an appropriate amount of 
the nucleic acid molecule of claim 17 or 18 so that 
hybridization of the gene or transcript encoding the 
CA125 protein will occur, thereby inhibiting the 
expression of the protein. 

25. A composition comprising the isolated nucleic acid 
molecule of claim 17 or 18. 



A vaccine for a cancer which expresses CA125 protein 
comprising an appropriate amount of the isolated 
nucleic acid molecules of claim 1 or 2 . 
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A vaccine for a cancer which expresses CA125 protein 
comprising an appropriate amount of an expression 
vector with the nucleic acid molecules which, when 
expressed, are capable of producing a product which 
induces an immune response to CA125 protein. 

The vaccine of claim 27, wherein the nucleic acid 
molecule comprises sequences encoding human CA125 
protein or a portion thereof. 

The vaccine of claim 28, wherein the expressed human 
sequence is linked to a carrier. 

The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a nonhuman sequence. 

The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a primate sequence. 

The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a murine sequence. 

33. The vaccine of claim 27, wherein the nucleic acid 
molecule comprises a synthetic sequence, which, when 
expressed, is capable of producing a product which 
induces an immune response to CA125 protein. 
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34 The vaccine of claim 33, wherein the sequence 
hybridizes with or is homologous to the sequences 
encoding human CA125 protein. 

35. The vaccine of claims 26-34, further comprising a 
suitable adjuvant. 

35 36. The vaccine of claims 26-34, wherein the adjuvant is an 
alum. 
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37. The vaccine of claims 26-36, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung 
carcinoma . 

5 38. A method to treat a cancer which expresses CA125 in a 
subject comprising administering to the subject an 
appropriate amount of the vaccine of claims 26-36. 

39. The method of claim 38, wherein the cancer is an 
10 ovarian, pancreatic, breast, endometrial, or lung 

carcinoma. 

40. A vaccine for a cancer which expresses CA125 comprising 
an appropriate amount of the expressed CA125 protein 

15 corresponding to the sequence in claim 1. 

41. A vaccine for a cancer which expresses CA125 protein 
comprising an appropriate amount of a substance which 
induces an immune response to CA125 protein. 
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The vaccine of claim 41, wherein the substance is a 
polypeptide or a peptide. 



43. The vaccine of claim 42, wherein the polypeptide 
comprises sequences encoding human CA125 protein or a 

• portion thereof . 

44. The vaccine of claim 43, wherein the expressed human 
sequence is linked to a carrier. 

45. The vaccine of claim 41, wherein the polypeptide 
comprises a nonhuman sequence. 



The vaccine of claim 45, wherein the polypeptide 
comprises a primate sequence. 

The vaccine of claim 45, wherein the polypeptide 
comprises a murine sequence. 
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48. The vaccine of claim 42, wherein the polypeptide 
comprises a synthetic sequence, which, when expressed, 
is capable of producing a product which induces immune 
response to CA125 protein. 

The vaccine of claims 40-48, further comprising a 
suitable adjuvant. 

The vaccine of claim 49, wherein the adjuvant is an 
10 alum . 

51. The vaccine of claims 40-50, wherein the expressed 
protein is conjugated to a protein carrier to increase 
the immunogenicity. 

15 52. The vaccine of claims 40-51, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung 
carcinoma . 

20 53. A method to treat a cancer which expresses CA125 in a 
subject comprising administering to the subject an 
appropriate amount of the vaccine of claims 40-51. 

54. A method to prevent a cancer which expresses CA125 in a 
25 subject comprising administering to the subject an 

appropriate amount of the vaccine of claims 40-51. 

55. The method of claims 53 or 54, wherein the cancer is an 
ovarian, pancreatic, breast, endometrial, or lung 

30 carcinoma . 

56. A method for the diagnosis of a cancer which expresses 
CA125 by detecting CA125-expressing cells in the blood 
or other fluids of patients based on the nucleic acid 

35 sequence which encodes CA125 . 

57. A method for monitoring the therapy of a cancer which 
expresses CA125 by measuring the expression of CA125- 

. expressing cells in the blood or other fluids of 



15 



20 



25 



30 



58. 



WO 02/092836 PCT/US02/14768 

43 

patients based on the nucleic acid sequence which 
encodes CA125, a decrease of either the number of 
CA125-expressing cells or level of protein expression 
in the cell, indicating the success of the therapy. 

5 

The method of claim 56 or 57, wherein the detection is 
based on polymerase chain reaction with appropriate 
primers . 

A method of producing CA125 protein comprising steps of: 

a) constructing a vector adapted for expression in a 
cell which comprises the regulatory elements 
necessary for expression of nucleic acid in the 
cell operatively linked to the nucleic acid 
encoding the CA125 protein so as to permit 
expression thereof; 
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b) 
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placing the cells of step (a) under conditions 
allowing the expression of the CA125 protein; and 

recovering the CA125 protein so expressed. 

The method of claim 59, wherein the cell type is 
selected from the group consisting of bacterial cells, 
yeast cells, insect cells, and mammalian cells. 

The CA125 protein expressed by the method in claim 59 
or 60. 

A method for production of antibodies against CA125 
protein using the protein of claim 61. 

Antibodies produced by the method of claim 62. 

A method for monitoring the therapy of cancer which 
expresses CA125 using the antibodies of claim 63. 

A method of diagnosis of cancer which expresses CA125 
using the antibodies of claim 63. 
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66 A method for determining the immunoreactive part of 
CA125 comprising contacting antibodies which are known 
to be reactive to CA125 with the protein of claim 61. 

5 67 A tranS genic nonhuman organism comprising the isolated 
nucleic acid molecule of claim 1 or 2 . 

68. A transgenic nonhuman mammal of claim 67. 

69. A nonhuman organism, wherein. the expression of CA125 is 
inhibited. 

70. The nonhuman mammal of claim 69. 

71. The nonhuman mammal of claim 70, wherein the mammal is 



10 



15 



20 



a mouse . 



72. A method for screening a compound for treatment of 
cancer which expresses CA125 protein comprising 
administering the compound to the transgenic nonhuman 
organism of claims 67-71, a decrease in expression of 
CA125 protein indicating that the compound may. be 
useful for treatment of the cancer. 
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FIGURE 2 

YYQSHLD LEDLQ * 

TACTACCAGTCACACCTAGACCTGGAGGATCTGCAATGACTGGAACTTGCC 5685 

GGTGCCTGGGGTGCCTTTCCCCCAGCCAGGGTCCAAAGAAGCTTGGCTGG 5736 

GGCAGAAATAAACCATATTGGTCGGAAAAAGGAAGGAGAATACAACGTCCA 5787 

GCAACAGTGCCCAGGCTACTACCAGTCCCCCCTAGACCTGGAGGATTTGCA 5838 

ATGACTGGAACTTGCCGGTGCCTGGGGTGCCTTTCCCCCAGCCAGGGTCC 5889 

AAAAAAGCTTGGCTGGGGCAAAAAIAMCCCATATTGGTCGGAAAAAAAAAA 5940 

AAAAAAAAAAAAAAAAAAAAAAAAA 5965 
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FIGURE 5 

RVDPIGPGLDRERLYWELSQLTNSITELGPYTLDRDSLYVNGFNPWSSVPTTSTTGTS 

TVHIATSGTPSSLPGHTAPWLLIPFTLOTTITNLHYEENMQHPGSRKFNTrERVLQGL 

LKPLFKSTSVGPLYSGCRLTLLRPEKHGAATGVDAICTLRLDPTGPGLDRERLYWELS 

QLTNSVTELGPYTUDRDSLYVNGFTHRSSWTTSIPGTSAVHLETSGTPASLPGirrAPG 

PLLVPFTLl^TITmQYEEDlvnmPGSRKFNTTERVLQGLLKPLFKSTSVGPLYSGCRLT 

LLRPEKRGAATGVDTICTHRLDPLNPGLDREQLYWELSKLTRGIIELGPYLLDRGSLY 

VNGFTHRNFVP H'S ITGTSTVHLGTSETPSSLPRPrVTGPLL WFTLNFTITNLQ YEEAM 

RHPGSRKFNTTERVLQGLLRPIJKNTSIGPLYSSCRLTLLRPEKDKAATRVDAJCTfffl 

PDPQSPGIIJREQLYWEI^QLTHGITELGPYTLDRDSLYVDGFTHWSPIPTTSTPGTSIV 

NIXSTSGIPTSIPBTTATCPLLWFnJSOFTTIMXJYEBNM 

IJKSTSVGPLYSGCRLTLIRPEKDGVATRVDAICTHRPDPKIPGLDRQQLYWELSOLT 

HSITELGPYTLDRDSLYWGFTQRSSWTTSTPGTFTVQPETSETPSSLPGPTATGPVLL 

PFTLNFTIINLQYEEDMHRPGSRKFNTTERVLQGLLMPIJKNTSVSSLYSG 

EKDGAATRVDAVCTHRPDPKSPGLDRERLYWKLSQLTHGITELGPYTXDRHSLYVN 

GFTHQSSlSmTRTPDTSTl^ATSRTPASLSGPTTASPLLVIJTIWTm^YEE^ 

HPGSRKFNTTERVLQGLLRPWKNTSVGPLYSGCRLTLLRPKKDGAA1XVDAICTYR 

PDPKSPGLDREQLYWELSQLTHSriELGPYTLDRDSLYVNGFTQRSSVPTTSIPGTPTV 

DLGTSGTPVSKPGPSAASPLLVLFILNFTITNLRYEENMQHPGSRKFNTTERVLOGLL 

RSLFKSTSVGPLYSGCRLTLLRPEKDGTATGVDAICTHHPDPKSPRLDREQLYWELSO 

LTHNlTEmPYALDlTOSLFVNGFTHRSSVSTTSTPGTPTVYLGASKTPASIFGPSAASH 
LLILFTLNFTm^RYEENMWPGSPJO^^ 

RPEKDGEATGVDAICTHRPDPTGPGLDREQLYLELSQLTHSITELGPYTLDRDSLYYN 

GFTHRSSWTTSTGWSEEPFTLNFTINNLRYMADMGQPGSLKFMTONVMOHLLSPL 

FQRSSLGARYTGCRVIALRSVKNGAETRVDLLCTYLQPLSGPGLPIKQVFHELSOO'm 

GITRLGPYSLDKDSLYLNGYNEPGPDEPPTTPKPATTFLPPLSEATTAMGYHLKTLTL 

NFTISNLQYSPDMGKGSATFNSTEGVLQHLLRPLFQKSSMGPFYLGCQLISLRPEKDG 

AATGVDTTCTYHPDPVGPGLDIQQLYWELSQLTHGVTQLGFYVLDRDSLFINGYAPO 
^SIRGEYQINFTflVNWNLSNPDPTC 

NLTMDSVLVTVKAIJSSNLDPSLVEQWLDKTLNASFHWLGSTYQLVDIHVTEMESS 

WQPTSSSSTQHFYPNFTITmPYSQDKAQPGTIWQRNKRNIEDALNQLFRNSSIKS 

YFSDCQVSTFRSWNRHHTGWSLOTSPIARRVDRVAIYEEFIJIMIRNGTO 

LDRSSVLVDGYSPNRNEPLTGNSDIJ'FWAVIFIGIAGLLGLITCLICGVLVTTRRRKKE 
GEYNVQQQCPGYYQSHLDLEDL ^X"^ 
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FIGURE 6 

1 cgcgttgatc ccatcggacc tggactggac agagagcggc tatactggga gctgagccag 
61 ctgaccaaca gcatcacaga gctgggaccc tacaccctgg atagggacag tctctatgtc 
121 aatggcttca acccttggag ctctgtgcca accaccagca ctcctgggac ctccacagtg 
181 cacctggcaa cctctgggac tccatectcc ctgcctggcc acacagcccc tgtccctotc 
241 ttgataccat tcaccctcaa ctttaccatc accaacctgc attatgaaga aaacatgcaa 
301 caccctggtt ccaggaagtt caacaccacg gagagggttc tgcagggtet gctcaagccc 
361 ttgttcaaga gcaccagcgt tggccctctg tactctggct gcagactgac cttgctcaga 
421 cctgagaaac atggggcagc cactggagtg gacgccatct gcaccctccg ccttgatecc 
481 actggtoctg gactggacag agagcggcta tactgggagc tgagccagct gaccaacagc 
541 gttacagagc tgggccccta caccctggac agggacagtc tctatgtcaa tggcttcacc 
601 catcggagct ctgtgccaac caccagtatt cctgggacct ctgcagtgca cctggaaacc 
661 tctgggactc cagcctccct ccctggccac acagcccctg gccctctcct ggtgccattc 
721 accctcaact tcactatcac caacctgcag tatgaggagg acatgcgtca ccctggttcc 
781 aggaagttca acaccacgga gagagtcctg cagggtctgc tcaagccctt gtteaagagc ' 
841 accagtgttg gccctctgta ctctggctgc agactgacct tgctcaggcc tgaaaaacgt 
901 ggggcagcca ccggcgtgga caccatctgc actcaccgcc ttgaccctct aaaccctgga 
961 ctggacagag agcagctata ctgggagctg agcaaactga cccgtggcat catcgagctg 
1021 ggcccctacc tcctggacag aggcagtctc tatgtcaatg gtttcaccca tcggaacttt 
1081 gtgcccatca ccagcactcc tgggacctcc acagtacacc taggaacctc tgaaactcca 
1141 tcctccctac ctagacccat agtgcctggc cctctcctgg tgccattcac cctcaacttc 
1201 accatcacca acttgcagta tgaggaggcc atgcgacacc ctggctccag gaagttcaat 
1261 accacggaga gggtcctaca gggtctgctc aggcccttgt tcaagaatac cagtatoggc 
1321 cctctgtact ccagctgcag actgaccttg ctcaggccag agaaggacaa ggcagccacc 
1381 agagtggatg ccatctgtac ccaccaccct gaccctcaaa gccctggact gaacagagag 
1441 cagctgtact gggagctgag ccagctgacc cacggcatca ctgagctggg cccctacacc 
1501 ctggacaggg acagtctcta tgtcgatggt ttcactcatt ggagccccat accaaccacc 
1561 agcactcctg ggacctccat agtgaacctg ggaacctctg ggatcccacc ttccctccct 
1621 gaaactacag ccaccggccc tctcctggtg ccattcacac tcaacttcac catcactaac 
1681 ctacagtatg aggagaacat gggtcaccct ggctccagga agttcaacat cacggagagt 
1741 gttctgcagg gtctgctcaa gcccttgttc aagagcacca gtgttggccc tctgtattct 
1801 ggctgcagac tgaccttgct caggcctgag aaggacggag tagccaccag agtggacgcc 
1861 atctgcaccc accgccctga ccccaaaatc cctgggctag acagacagca gctatactgg 
1921 gagctgagcc agctgaccca cagcatcact gagctgggac cctacaccct ggatagggac 
1981 agtctctatg tcaatggttt cacccagcgg agctctgtgc ccaccaccag cactcctggg 
2041 acttteacag tecagccgga aacctctgag actccatcat ccctccctgg ccccacagcc 
2101 actggccctg tcctgctgcc attcaccctc aattttacca tcattaacct gcagtatgag 
2161 gaggacatgc atcgccctgg ctccaggaag ttcaacacca cggagagggt ccttcagggt 
2221 ctgcttatgc ccttgttcaa gaacaccagt gtcagctctc tgtactctgg ttgcagactg 
2281 accttgctca ggcctgagaa ggatggggca gccaccagag tggatgctgt ctgcacccat 
2341 cgtcctgacc ccaaaagccc tggactggac agagagcggc tgtactggaa gctgagccag 
2401 ctgacccacg gcatcactga gctgggcccc tacaccctgg acaggcacag tctctatgtc 
2461 aatggtttca cccatcagag ctctatgacg accaccagaa ctcctgatac ctccacaatg 
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FIGURE 6 
(cont.) 



2521 cacctggcaa cctcgagaac tccagcctcc ctgtctggac ctacgaccgc cagccctctc 
2581 ctggtgctattcacaattaa cttcaccatc actaacctgc ggtatgagga gaacatgcat 
2641 caccctggct ctagaaagtt taacaccacg gagagagtcc ttcagggtet gctcaggcct 
2701 gtgttcaaga acaccagtgt tggccctctg tactctggct gcagactgac cttgctoagg 
2761 cccaagaagg atggggcagc caccaaagtg gatgccatct gcacctaccg ccctgatccc 
2821 aaaagccctg gactggacag agagcagcta tactgggagc tgagccagct aacccacagc 
2881 atcactgagc tgggccccta caccctggac agggacagtc tctatgtcaa tggtttcaca 
2941 cagcggagct ctgtgcccac cactagcatt cctgggaccc ccacagtgga cctgggaaca 
3001 tctgggactc cagtttctaa acctggtccc tcggctgcca gccctctcct ggtgctattc 
3061 actcteaact tcaccatcac caacctgcgg tatgaggaga acatgcagca ccctggctcc 
3121 aggaagttca acaccacgga gagggtcctt cagggcctgc tcaggtccct gttcaagagc 
3181 accagtgttg gccctctgta ctctggctgc agactgactt tgctcaggcc tgaaaaggat 
3241 gggacagcca ctggagtgga tgccatctgc acccaccacc ctgaccccaa aagccctagg 
3301 ctggacagag agcagctgta ttgggagctg agccagctga cccacaatat cactgagctg 
3361 ggcccctatg ccctggacaa cgacagcctc tttgtcaatg gtttcactca tcggagctct - 
3421 gtgtccacca ccagcactcc tgggaccccc acagtgtatc tgggagcatc taagactcca 
3481 gcctcgatat ttggcccttc agctgccagc catctcctga tactattcac cctcaacttc 
3541 accatcacta acctgcggta tgaggagaac atgtggcctg gctecaggaa gttcaacact 
3601 acagagaggg tecttcaggg cctgctaagg cccttgttca agaacaccag tgttggccct 
3661 ctgtactctg gctgcaggct gaccttgctc aggccagaga aagatgggga agccaccgga 
3721 gtggatgcca tctgcaccca ccgccctgac cccacaggcc ctgggctgga cagagagcag 
3781 ctgtatttgg agctgagcca gctgacccac agcatcactg agctgggccc ctacacactg 
3841 gacagggaca gtctctatgt caatggtttc acccatcgga gctctgtacc caccaccagc 
3901 accggggtgg tcagcgagga gccattcaca ctgaacttca ccatcaacaa cctgcgctac 
3961 atggcggaca tgggccaacc cggctccctc aagttcaaca tcacagacaa cgtcatgcag 
4021 cacctgctca gtcctttgtt ccagaggagc agcctggglg cacggtacac aggctgcagg 
4081 gtcatcgcac taaggtctgt gaagaacggt gctgagacac gggtggacct cctctgcacc 
4141 tacctgcagc ccctcagcgg cccaggtctg cctatcaagc aggtgttcca tgagctgagc 
4201 cagcagaccc atggcatcac ccggctgggc ccctactctc tggacaaaga cagcctctac 
4261 cttaacggtt acaatgaacc tggtccagat gagcctccta caactcccaa gccagccacc 
4321 acattcctgc ctcctctgtc agaagccaca acagccatgg ggtaccacct gaagaccctc 
4381 acactcaact tcaccatctc caatctccag tattcaccag atatgggcaa gggctcagct 
4441 acatteaact ccaccgaggg ggtccttcag cacctgctca gacccttgtt ccagaagagc 
4501 agcatgggcc ccttctactt gggttgccaa ctgatctccc tcaggcctga gaaggatggg 
4561 gcagccactg gtgtggacac cacctgcacc taccaccctg accctgtggg ccccgggctg 
4621 gacatacagc agctttactg ggagctgagt cagctgaccc atggtgtcac ccaactgggc 
4681 ttctatgtcc tggacaggga tagcctcttc atcaatggct atgcacccca gaatttatca 
4741 atccggggcg agtaccagat aaatttccac attgtcaact ggaacctcag taatccagac 
4801 cccacatect cagagtacat caccctgctg agggacatec aggacaaggt caccacactc 
4861 tacaaaggca gtcaactaca tgacacattc cgcttctgcc tggtcaccaa cttgacgatg 
4921 gactccgtgt tggtcactgt caaggcattg ttctcctcca atttggaccc cagcctggtg 
4981 gaecaastct ttctasataa eaeect«>ant <nv>+/taM»» ^ ctccacctac 
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FIGURE 6 
(cont.) 

5041 cagttggtgg acatccatgt gacagaaatg gagtcatcag tttatcaacc aacaagcagc 
5101 tccagcaccc agcacttcta cccgaatttc accatcacca acctaccata ttcccaggac 
5161 aaagcccagc caggcaccac caattaccag aggaacaaaa ggaatattga ggatgcgctc 
5221 aaccaactct tccgaaacag cagcatcaag agttattttt ctgactgtca agtttcaaca 
5281 ttcaggtctg tccccaacag gcaccacacc ggggtggact ccctgtgtaa cttctcgcca 
5341 ctggctcgga gagtagacag agttgccatc tatgaggaat ttctgcggat gacccggaat 
5401 ggtacccagc tgcagaactt caccctggac aggagcagtg tccttgtggatgggtattct 
5461 cccaacagaa atgagccctt aactgggaat tctgaccttc ccttctgggc tgtcatcttc 
5521 atoggcttgg caggactcct gggactcatc acatgcctga tctgcggtgt cctggtgacc 
5581 acccgccggc ggaagaagga aggagaatac aacgtccagc aacagtgccc aggctactac 
5641 cagtcacacc tagacctgga ggatctgcaa tgactggaac ttgccggtgc ctggggtgcc 
5701 tttcccccag ccagggtcca aagaagcttg gctggggcag aaataaacca tattggtcgg 
5761 aaaaaggaag gagaatacaa cgtccagcaa cagtgcccag gctactacca gtccccccta 
5821 gacctggagg atttgcaatg actggaactt gccggtgcct ggggtgcctt tcccccagcc 
5881 agggtccaaa aaagcttggc tggggcaaaa ataaaccata ttggtcggaa aaaaaaaaaa 
5941 aaaaaaaaaa aaaaaaaaaa aaaaa 

// 
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ATGTTCAAGAACACCAGTGTCGGCCTTCTGTACTCTGGCTGCAGACTGACCTTGCTCA 

GGCCTGAGAAGAATGGGGCAGCCACTGGAATGGATGCCATCTGCAGCCACCGTCTTG 

ACCCCAAAAGCCCTGGACTCAACAGAGAGCAGCTGTACTGGGAGCTGAGCCAGCTGA 

CCCATGGCATCAAAGAGCTGGGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCA 

ATGGTTTCACCCATCGGAGCTCTGTGGCCCCCACCAGCACTCCTGGGACCTCCACAGT 

GGACCTTGGGACCTCAGGGACTCCATCCTCCCTCCCCAGCCCCACAACAGCTGTTCCT 

CTCCTGGTGCCGTTCACCCTCAACTTTACCATCACCAATCTGCAGTATGGGGAGGACA 

TGCGTCACCCTGGCTCCAGGAAGTTCAACACCACAGAGAGGGTCCTGCAGGGTCTGCT 

TGGTCCCTTGTTCAAGAACTCCAGTGTCGGCCCTCTGTACTCTGGCTGCAGACTGATCT 

CTCTCAGGTCTGAGAAGGATGGGGCAGCCACTGGAGTGGATGCCATCTGCACCCACC 

ACCTTAACCCTCAAAGCCCTGGACTGGACAGGGAGCAGCTGTACTGGCAGCTGAGCC 

AGATGACCAATGGCATCAAAGAGCTGGGCCCCTACACCCTGGACCGGAACAGTCTCT 

ACGTCAATGGTTTCACCCATCGGAGCTCTGGGCTCACCACCAGCACTCCTTGGACTTC 

CACAGTTGACCTTGGAACCTCAGGGACTCCATCCCCCGTCCCCAGCCCCACAACTGCT 

GGCCCTCTCCTGGTGCCATTCACCCTAAACTTCACCATCACCAACCTGCAGTATGAGG 

AGGACATGCATCGCCCTGGATCTAGGAAGTTCAACGCCACAGAGAGGGTCCTGCAGG 

GTCTGCTTAGTCCCATATrCAAGAACTCCAGTGTrGGCCCTCTGTACTCTGGCTGCAG 

ACTGACCTCTCTCAGGCCCGAGAAGGATGGGGCAGCAACTGGAATGGATGCTGTCTG 

CCTCTACCACCCTAATCCCAAAAGACCTGGGCTGGACAGAGAGCAGCTGTACTGGGA 

GCTAAGCCAGCTGACCCACAACATCACTGAGCTGGGCCCCTACAGCCTGGACAGGGA 

CAGTCTCTATGTCAATGGTTTCACCCATCAGAACTCTGTGCCCACCACCAGTACTCCT 

GGGACCTCCACAGTGTACTGGGCAACCACTGGGACTCCATCCTCCTTCCCCGGCCACA 

CAGAGCCTGGCCCTCTCCTGATACCATTCACTTrCAACTTTACCATCACCAACCTGCAT 

TATGAGGAAAACATGCAACACCCTGGTTCCAGGAAGTTCAACACCACGGAGAGGGTT 

CTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAACACCAGTGTTGGCCCTCTGTACTCTG 

GCTGCAGACTGACCTTGCTCAGACCTGAGAAGCAGGAGGCAGCCACTGGAGTGGACA 

CCATCTGTACCCACCGCGTTGATCCCATCGGACCTGGACTGGACAGAGAGCGGCTATA 

CTGGGAGCTGAGCCAGCTGACCAACAGCATCACAGAGCTGGGACCCTACACCCTGGA 

TAGGGACAGTCTCTATGTCAATGGCTTCAACCCTTGGAGCTCTGTGCCAACCACCAGC 

ACTCCTGGGACCTCCACAGTGCACCTGGCAACCTCTGGGACTCCATCCTCCCTGCCTG 

GCCACACAGCCCCTGTCCCTCTCTTGATACCATTCACCCTCAACTTTACCATCACCAAC 

CTGCATTATGAAGAAAACATGCAACACCCTGGTTCCAGGAAGTTCAACACCACGGAG 

AGGGTTCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGCGTTGGCCCTCTGT 

ACTCTGGCTGCAGACTGACCTTGCTCAGACCTGAGAAACATGGGGCAGCCACTGGAG 

TGGACGCCATCTGCACCCTCCGCCTTGATCCCACTGGTCCTGGACTGGACAGAGAGCG 

GCTATACTGGGAGCTGAGCCAGCTGACCAACAGCGTTACAGAGCTGGGCCCCTACAC 

CCTGGACAGGGACAGTCTCTATGTCAATGGCTTCACCCATCGGAGCTCTGTGCCAACC 

ACCAGTATTCCTGGGACCTCTGCAGTGCACCTGGAAACCTCTGGGACTCCAGCCTCCC 

TCCCTGGCCACACAGCCCCTGGCCCTCTCCTGGTGCCATTCACCCTCAACTTCACTATC 

ACCAACCTGCAGTATGAGGAGGACATGCGTCACCCTGGTTCCAGGAAGTTCAACACC 

ACGGAGAGAGTCCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGTGTTG 
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==== 

ATrrrA^CTTCCCTC 

r^GCTGG^ACCCTACACGCTGGATAGGGACAGTCT 

Sgg^taccagtcaatggtatttg 
^gat^aaa^ 

AAAA 
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FIGURE 8 



MFKNTSVGIXYSGCRLTLLRPEKNGAATGMDAICSHRLDPKSPGLNREQLYWELS 

QLTHGKELGPYTLDRNSLYVNGFTHRSSVAPTSTPGTSTVXJLGTSGTPSSIJ'SPTT 

AWLLWFTLNFTnmQYGEDMRHPGSRKPNTTERVLQGLIX3PIJKNSSVGPLYS 

GCmSLRSEKDGAATGVDAICTHHLNPQSPGLDREQLYWQLSQMTNGIKELGPY 

TLDRNSLYVNGFTHRSSGLTTSTPWTSTVDLGTSGTPSPVPSPTTAGPLLVPFTLNF 

TnmQYEEDMHRPGSRKFNATERVLQGLLSPIFKNSSVGPLYSGCRLTSLRPEKDG 

AATGMDAVCLYHPNPKRPGLDREQLYWELSQLTHNITELGPYSLDRDSLYVNGFT 

HQNSWTTSTPGTSTVWATTGTPSSFPGHTEPGPIlJPFTEWITimHYEE^ 

PGSRKJ^TITERVLQGLLKTLFKNTSVGPLYSGCRLTLIJRPEKQEAAT^ 

VDPIGPGLDRERLYWELSQLTNSrTELGPYTLDRDSLYVNGFNPWSSVPTTSTPGTS 

TVHLATSGTPSSLPGHTAPWLUPFTIJ^lTimHYEENMQHPGSPJO^ 

GLLKPLFKSTSVGPLYSGCRLTLLRPEKHGAATGVDAICTLRLDPTGPGLDRERLY 

WEI^QLTNSVTELGPYTLDRDSLYVNGFTHRSSVPTTSIPGTSAVHLETSGTPASLP 

GHTAPGPLLWFTLNFTIimQYEEDMREDPGSRKFNTTERVLQGLLKPLFKSTSVGP 

LYSGCRLTLLRPEKRGAATGVDTICTHRLDPLNPGLDREQLYWELSKLTRGnELGP 

YIXDRGSLYVNGFTHRNFWITSTPGTSTVHLGTSETPSSLPRPrVPGPLLVPFTLNF 

TITbnLQYEEAMRHPGSRKFNTTERVLQGLLRPLFKNTSIGPLYSSCRLTLLRPEKDK 

AATRVDAICTHHPDPQSPGLNREQLYWELSQLTHGI'TELGPYTLDRDSLYVDGFTH 

WSPIPTTSTPGTSrVM.GTSGIPPSLPETTATGPLLVPFTLNFnTNLQYEENMGHPGS 

RKFNITESVLQGLLKPLFKSTSVGPLYSGCRLTLLRPEKDGVATRVDAICTHRPDPK 

IPGLDRQQLYWELSQLTHSITELGPYTLDRDSLYVNGFTQRSSVPTTSSEYSTDVPM 

APILQQT*QELTPIHKPLCPI^GRhfIEDTNYSPSPLPQLIRVPAEAPQAKIPMNSPSC 

WHYXP*EHXAPFIYEGFSSAPGTFTVQPETSETPSSLPGPTGKYQSMVFGAWLMS 

VMSVYTLLEHG**V*TSLSLFTQIJOvlEIHSKCSNHRSTNPVH*ALP 
TKKKKKX 
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200 kDa 

97 kDa 
68 kDa 
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[ Clone B4- 



Repeat A — Repeat B — Repeat C — Repeat D — Repeat E — Repeat F 

j 

-Repeat G — Repeat H 

VPMA PI LQQT * 

GTTCCCATGGCCCCAATCTTACAACAAACTTAGCAGGAGCTGACCCCTATTCA 

TAAGCCCTTATGTCCTTTCCATAAGGGAAGGAACATAGAGGACACAAATTATT 

CCCCTTCCCCACTGCCCCAGCTAATCAGAGTCCCAGCTGAAGCCCCACAGGCA 

AAAATCCCCATGAATAGTCCCTCCTGCTGGCATTACNTTCCATGAGAGCACNT 

TGCTCCTTTCACTGTTGAGGGCTTCTCCTCAGCTCCTGGGACTTTCACAGTACA 

GCCGGAAACCTCTGAGACTCCATCATCCCTCCCTGGCCCCACAGGTAAATACC 

AGTCAATGGTATTTGGAGCATGGTTGATGAGTGTAAACATCTCTGTTTATACTC 

TGTTAGAGCATGGTTGATGAGTGTAAACATCTCTGTCATTATTCACTCAACTAA 

AGATGGAAATTCATAGTAAATGTAGTAACCATAGGTCAACCAACCCAGTTCAT 

TGAGCACTGCCTCTGTATCAGGACCTGGATATACATCAGGGAACAAAAAAAA 
AAAAAAAAAA 
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FIGURE 11 



====== 



^AGGACAC^^ 
ScATA?CAC?AGACA?CTCTACT^ 

tcacIgaatca^agaaag^^ 

AGAC^G^od^ 

aScSgga^ 
SSgS 

trrtraaaracacctgtagtcaatgtagggactgtgatitataaacatctatcccctt 

SSg^tg^^^ 

CC^AG^AACAGAGAGAAGTGCrrCTCITrCT^ 

TAAGGTCTCCAGAACAGAAGCCCTCTCCTTAGGCAGAACATCCACCCCAGGTCCTGCT 

CAATCCACAATA^ 
?c!JcACGACAGGATCAGCAGA^^ 

ggtStcag^atctotcat^ 

^gIgXgta^cactcgtctcctctccgggtgaot 
IgaccSIScatg^ggacacaag 

GAATAT^CCTCA^GATGAGAGTCTGGCCACrrCTAA^ 

aattcagctttca^ 

G^A^C^TTCCTC^^^ICAGGCCTCCCAGAGCCATCCAAAGTGACATCTCCAATGG 
TCA^CTCrrcC^ACCATAAAAGACATTGT^ 

A^TC^AGGAGA^C^V^CTCAGCCACAAAGCCAAGCACTGTTCCTTACAAGGCACTCA 
CCCTGATCAGTCCACAATGTCACAAGACATATCCACTGAAGTGATCACCA 
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FIGURE 11 
(cont. ) 

GGCTCTCTACCTCCCCCATCAAGACAGAATCTACAGAAATGACCATTACCACCCAAACAGGT 

TCTCCTGGGGCTACATCAAGGGGTACCCTTACCTTGGACACrTCAACAACTTITATGTCAGG 

GACCCATTCAACTGCATCTCAAGGATTTTCACACTCACAGATGACCGCTCTTATGAGTAGAA 

CTCCTGGAGAGGTGCCATGGCTAAGCCATCCCTCTGTGGAAGAAGCCAGCTCTGCCTCTTTC 

TCACTGTCTTCACCTGTCATGACCTCATCTTCTCCCGTTTCTTCCACATTACCAGACAGCATC 

CACTCTrCTTCGOTCCTGTGACATCACTTCTCACCTCAGGGCTGGTGAAGACCACAGAGCTG 

TTGGGCACAAGCTCAGAACCTGAAACCAGTTCACCCCCAAATTTGAGCAGCACCTCAGCTGA 

AATACTGGCCACCACTGAAGTCACTACAGATACAGAGAAACTGGAGATGACCAATGTGGTA 

ACCTCAGGTTATACACATGAATCTCOTCCTCTGTCCTAGCTGACTCAGTGACAACAAAGGC 

CACATCTTCAATGGGTATCACCTACCCCACAGGAGATACAAATGTTCTCACATCAACCCCTG 

CCTTCTCTGACACCAGTAGGATTCAAACAAAGTCAAAGCTCTCACTGACTCCTGGGTTGATG 

GAGACCAGCATCTCTGAAGAGACCAGCTCTGCCACAGAAAAAAGCACTGTCCTTTCTAGTGT 

GCCCACTGGTGCTACTACTGAGGTCTCCAGGACAGAAGCCATCTCTTCTAGCAGAACATCCA 

TCCCAGGCCCTGCTCAATCCACAATGTCATCAGACACCTCCATGGAAACCATCACTAGAATT 

TCTACCCCCCTCACAAGGAAAGAATCAACAGACATGGCCATCACCCCCAAAACAGGTCCTTC 

TGGGGCTACCTCGCAGGGTACCTTTACCTTGGACTCATCAAGCACAGCCTCCTGGCCAGGAA 

CTCACTCAGCTACAACTCAGAGATTTCCACGGTCAGTGGTGACAACTCCTATGAGCAGAGGT 

CCTGAGGATGTGTCATGGCCAAGCCCGCTGTCTGTGGAAAAAAACAGCCCTCCATCTTCCCT 

GGTATCTTCATCTTCAGTAACCTCACCTTCGCCACTTTATTCCACACCATCTGGGAGTAGCCA 

ctcctctcctgtccctgtcacttctcttitcacctctatcatgatgaagck;cacagacatg^ 

GGATGCAAGTTTGGAACCTGAGACCACTTCAGCTCCCAATATGAATATCACCTCAGATGAGA 

GTCTGGCCGCTTCTAAAGCCACCACGGAGACAGAGGCAATTCACGTTTTTGAAAATACAGCA 

GCGTCCCATGTGGAAACCACCAGTGCTACAGAGGAACTCTATTCCTCTTCCCCAGGCTTCTC 

AGAGCCAACAAAAGTGATATCTCCAGTGGTCACCTCTTCCTCTATAAGAGACAACATGGTTT 

CCACAACAATGCCTGGCTCCTCTGGCATTACAAGGATTGAGATAGAGTCAATGTCATCTCTG 

ACCCCTGGACTGAGGGAGACCAGAACCTCCCAGGACATCACCTCATCCACAGAGACAAGCA 

CTGTCCTTTACAAGATGCCCTCTGGTGCCACTCCTGAGGTCTCCAGGACAGAAGTTATGCCC 

TCTAGCAGAACATCCATTCCTGGCCCTGCTCAGTCCACAATGTCACTAGACATCTCCGATGA 

AGTTGTCACCAGGCTGTCTACCTCTCCCATCATGACAGAATCTGCAGAAATAACCATCACCA 

CCCAAACAGGTTATTCTCTGGCTACATCCCAGGTTACCCTTCCCTTGGGCACCTCAATGACCT 

TTTTGTCAGGGACCCACTCAACTATGTCTCAAGGACTTTCACACTCAGAGATGACCAATCTT 

ATGAGCAGGGGTCCTGAAAGTCTGTCATGGACGAGCCCTCGCTTTGTGGAAACAACTAGATC 

TTCCTCTTCTCTGACATCATTACCTCTCACGACCTCACTrTCTCCTGTGTCCTCCACATTACTA 

GACAGTAGCCCCTCCTCTCCTCTrcCTGTGACTTCACTTATCCTCCCAGGCCTGGTGAAGACT 

ACAGAAGTGTTGGATACAAGCTCAGACKJCTAAAACCAGTTCATCTCCAAATTTGAGCAGCAC 

CTCAGTTGAAATACCGGCCACCTCTGAAATCATGACAGATACAGAGAAAATTCATCCTTCCT 

CAAACACAGCGGTGGCCAAAGTGAGGACCTCCAGTTCTGTTCATGAATCTCATTCCTCTGTC 
CTAGCTGACTCAGAAACAACCATA 
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FIGURE 11 
(cont.) 



a2Itc?ctSctctgatcagtccacgatgtctccagacatc^^^ 



SgaA^^aS 

G^Scl^AAAGTACTCA^CATCTGAGTACAGA^ 

??act5gcacagtcatgccttctctatcagaggccatgact^^ 
§caSagccS 

TM^OTCTCCACCAtT^ 

cSatSagagtcgacaccagtcttgggaca 
Sg^gga^ 

A^CA^CCAGGCAGGACATCTTCATCACrc^ 
rAGCTGGGAACACTGAC^ 

AGACC^jGTCAAAG^ 

SagSgaca^gaatcg^^^ 

rArrA^^^^^^G^ACCTrGACCACCAGTGTCTATACTCCCACTTTGGGAACACTGACTC 

COCTCAATGCATCAATGCAAATG^j 
AT^rT^rrrT^TGTTCCAGA^ 

TCACTGGTCTCTCGTTCTGGGGCAGAGAGAAGTCCGGTTATTCAAACTCTAGATGTTTCTTCT 
AGTGAGCCAGATACAACAGCTTCATGGGTTAT 
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FIGURE 11 
(cont. ) 

TAGACACTGTATCTTCCACAGCCACCAGTCATGGGCSCAGACGTCAGCrCAGCCATrCCAACA 

CTCACTCATCCTGCAGAGACCAGCTCAACTATTCCCAGAACAATCCCCAATTTTTCTCATCAT 

GAATCAGATGCCACACCTTCAATAGCCACCAGTCCTGGGGCAGAAACCAGTTCAGCTATTCC 

AATTATGACTGTCTCACCTGGTGCAGAAGATCTGGTGACCTCACAGGTCACTAGTTCTGGCA 

CAGACAGAAATATGACTATTCCAACTTTGACTCTTTCTCCTGGTGAACCAAAGACCATAGCC 

TCATTAGTCACCCATCCTGAAGCACAGACAAGTC^^ 

TOTGTATCACGGTTGGTGACCTCAATGGTCACCAGlTrGGCGGCAAA^ 

ATCGAGCTCTGACAAACTCCCCTGGTGAACCAGCTACAACAGmCA^ 

GCACAGACCAGCCCAACAGTTCCCTGGACAACTTCCATTTTTTTCCATAGTAAATCAGACAC 

CACACCTrCAATGACCACCAGTCATGGGGCAGAATCCAGTrCAGCTGTTCCAACTCCAACTG 

TTTCAACTGA(3GTACCAGGAGTAGTGACCCCTrTGGTCACCAGlTCTAGGG^ 

CAGTCATGGGGAAGAAGCCAGTTCTGCTATTCCAACTCCAACTGTTTCACCTGGGGTACCAG 

GA ^T GA 5 :CTCTCT ^^ 

AC I^ CTOTGGTGAA ^ 

GTTCTACJGGCAGTAACCAGTACAACTCTTCCAACTCTGACTCTTrCTCCT^ 

accacaccttcaatggccaccagtcatggggcaga^^ 

ACCTGAWTACCAGGAGTGGTGACCTC^^^ 

GTATTCCAACTCTGATTCTTTCTCCTGGTGAACTAGAAA 
ATGGGC3CAGAAGCCA<3CTCAGCTGTTC^^ 

GTGACCCCTCTGGTCACTAGTTCCAGGGCAGTGACCAGT^^^ 
TTCTTCTAGTGAGCCAGAGACCACACCTC^ 

^^ CTAACTGmCACCTGAGGTACCAGG ^^ 

GAGCAGTAACCAGTACAACTATTCCAACTCTGACTATTTCTrCTGATGAACCAGAGAC^ 
ACTTCATTGGTCACCCAmTCAGGCAAAGATGAm^^ 

OCTACTGTACAAGGGCTCKSTGACTTCACTGGTCACTAGITCTGGGTCAGAGAC^ 

ttcaaatctaactgttgcctcaagtcaaccagagaccatagactcatgggtcgctS 

^ C ^ CA S TGAA ™ ACACTATGCOTCTA ^ 

A^£S^^^ 
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FIGURE 11 
(cont.) 



I^Sgaga^ 

A^^Ar^rA^GGTCATCAGTTCTGGGA 



rrACCCA^TATACCAGGGGTAGTGACCTCACAGGTCACTAGTTCTGCA^ 
A^CTA^TCCAACTrrGACTCCTTCTCCTGGTGA^ 



ATrrTGGGACACAGACTGGCTrCACTGTTCCAATTCGGACTGTTCCCTCTAGTGAGrc 
CAGT 



ACAATGGCTTCCTGGGTCACTCATCCTCCACAGACCAGCACACCTG'I 



AGAGACCACArcOTATrGAGCACCCATCCCAGAACAGAGACAAG^ 

^A?Ar^G^SAAGTCCAACTGOTC^^ 
S^A?ArTArAGGOTAC^ 

t?gS™S 

lAArrA^TCATGGA^ 

^^p^5^^2a^^^caag^ataaccatcggtcctggatctccaccaccagcggttataa 
:< 

• ^CGCC c gagagag^ctg^agg^tctgc^aa^ 
gcagtggatgccatctgc 
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FIGURE 11 
(cont.) 

ACACATCGCCCTGACCCTGAAGACCTCGGACTGGACAGAGAGCGACTGTACTGGGAGCTGA 

GCAATCTGACAAATGGCATCCAGGAGCTGGGCCCTTACACCCTGGACCGGAACAGTCTCTAT 

GTCAATGGTTTCACCCATCGAAGCTCTATGCCCACCACCAGCACTCCTGGGACCTCCACAGT 

GGATGTGGGAACCTCAGH3GACTCCATCCTCCAGCCCCAGCCCCACGACTGCTGGCCCTCTCC 

TGATGCCGTTCACCCrCAACTTCACCATCACCAACCTGCAGTACGAGGAGGACATGCGTCGC 

ACTGGCTCCAGGAAGTTCAACACCATGGAGAGTGTCCTGCAGGGTCTGCTCAAGCCATTGTT 

CAA ^ CACCAG ^ 

AAGATGGGGCAGCCACTGGAGTGGATGCCATCTGCACCCACCGCCTTGACCCCAAAAGCCN 
TGGACTCAACAGGGAGCAGCTGTACTGGGAGCTAAGCAAACTGACCAATGACAITGAAGAG 
^^^ ACACC ^^CAGGAACAGTCTCTATGTCAAJGGTnCACCCATCAGA(Sc 

TCACCATCACCAACCTGCAGTATGGGGAGGACATGGGTCACCCTGGCTCCAGGAAGTTCAAC 
ACCACAGAGAGGGTCCTGCTGGGTCTGCTTGGTCCCATATTCAAGAACACCAGTGTTGGCCC 

TGGATGCCATCTGCATCCATCATCTTGACCCCAAAAGCCCTGGACTCAACAGAGAGCGGCTG 

GGAACAGTCTCTATGTCAATGGTTTCACCCATCGGACCTCTGTGCCCACCACCAGCACTCCT 
G ^? C J CCACAGTGGACOT ^ 

TGCTGGCCCTCTCCTGGTGCTGTTCACCCTCAACITCACCATCACCAACCTGAAGTATGAGG 

AGGACATGCATCGCCCTGKjCTCCAGGAAGTTCAACACCACTGAGAGGGTCCTGCAGACTCTG 

OTGGTCCTATGTTCAAGAACACCAGTGTTGGCCTTCTGTACTCTGGCTGC^ 

A SS^ AAAAGCCCTG ^^ 

TGGCATCAAAGAGCTGGGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCAATGGlTrC^ 
CCCATTGGATCCCTGTGCCCACCAGCAGCCCTGGGACCTCCACAGTGGACCTTGGGTCAGGG 
ACTCCATCCTCCCTCCCCAGCCCCACAAGTGCTGCTGGCCCTCTCCT^ 
AACTCCACCATCACCAACCTGCAGTA^ 

TCAACACCACGGAGCGGGTCCTGCAGACTCTGGTTGGTCCTATGTTCAAGAACACCAGn^T 
GGCCrrCTGTACTCTGGCTGCAGACTGACCTrGCTCAGGTCCG^ 

CAGCTATACTGGGAGCTGAGCCAGCTGACCAATGKjCATCAAAGAGCTGGGCCCC^ 

TGGACAGGAACAGTCTCTATGTCAATGGTTTCACCCATTGGATCCCTGTGCCCACCAGCAGC 

A ?£™™ AC ™ CCACAG ™ A ^^ 

AACTGCTGGCCCTCTCCTGGTGCCGTTCACCCTCAACTTCACCATCACCAACCT 

AGGAGGACATGCATTGCCCTGGCTCCAGGAAGTrCAACACCACAGAGAGAGTCCTGCAGAG 
TCTGCTTGGTCCCATGTTCAAGAACACCAGT^^ 

C<nTGCTCAGGTCCGAGAA(K}ATGGAGCA(3CCACTGGAGTGGATGCCAI^ 
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FIGURE 11 
(cont.) 

CCTGGACTCAACAGAGAGCAGCTGTACTGGGAGCTGACKXAG^ 
AnrTnnGCCCCTACACCCTGGACAGGAACAGTCTCTATGTCAATGGTTTCACCCATCGGAGC 

cScaccaa^ctgcag™ 

CAGAGA^TCCT&C^ 

tactctgotgcagI^ 

TGCCATCTGCACCCACCACCTTAACCCTCAA^ 

cTTCCACA^rcGACCTO 

GWCCTCTCCTGGTGCCATTCACCCTAAA 

r^GCATCGCCCTGGA^CTAGGAAGTTCAACGCCACAGAGAGGGTCCTGCAG 
G^CCCGAGAAGG/^^ 

AAAAG^^TGGGCTGGAC^GAGAGCAGCTGTACTGGGAGCTAAGC 

?CA^GA^CTGGGCCCCTACAGCCT(3GACAGGGACAGTCrCTATGTC^ 

CAGAACTCTGTGCCCACCACCAGTACTCCTGGGACCTCCACAGTGTACTGGGCAACCACTGG 

gactoStcctccttccccggccacacagagcctggccctctcct 

rrn^CCATCACCA^ 

AGTWA^CCA^ 
CTA^CTGTCA^^ 

cctgSacctc^cacagtgcacctggcaacctct^ 

aSccctStccctctot^ 

aSaaSatgScaccctggttcca^ 

CTGCTCAAGCCCTCGTTCA 
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FIGURE 11 
(cant.) 

CTTGCTCAGACCTGAGAAACATGGGGCAGCCACTGGAGTGGACGCCATCTGCACCCTCCGCCT 

TGATCCCACTGGTCCTGGACTGGACAGAGAGCGGCTATACTGGGAGCTGAGCCAGCTGACCAA 

CAGCGTTACAGAGCTGGGCCCCTACACCCTGGACAGGGACAGTCTCTATGTCAATGGCTTCAC 

CCATCGGAGCTCTGTGCCAACCACCAGTATTCCTGGGACCTCTGCAGTGCACCTGGAAACCTC 

TGGGACTCCAGCCTCCCTCCCTGGCCACACAGCCCCTGGCCCTCTCCTGGTGCCATTCACCCTC 

AACTTCACTATCACCAACCTGCAGTATGAGGAGGACATGCGTCACCCTGGTTCCAGGAAGTTC 

AACACCACGGAGAGAGTCCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGTGTTGGC 

CCTCTGTACTCTCKJCTGCAGACTGACOTGCTCAGGCCTGAAAAACGTGGGGCAGCCACCGGC 

GT(}GACACCATCTGCACTCACCGCCITGACCCTCTAAACCCTGGACTGGACAGAGAGCAGCrA 

TACTGGGAGCTGAGCAAACTGACCCGTGGCATCATCGAGCTGGGCCCCTACCTCCTGGACAGA 

GGCAGTCTCTATGTCAATGKjTTTCACCCATCGGAACTTTGTGCCCATCACCAGCACTCCTGGGA 

CCTCCACAGTACACCTAGGAACCTCTGAAACTCCATCCTCCCTACCTAGACCCATAGTGCCTG 

GCCCTCTCCTGGTGCCArrCACCCTCAACTTCACCATCACCAACTTGCAGTATGAGGAGGCCAT 

GCGACACCCTGGCTCCAGGAAGTTCAATACCACGGAGAGGGTCCTACAGGGTCTGCTCAGGCC 

CTTGTTCAAGAATACCAGTATCGGCCCTCTGTACTCCAGCTGCAGACTGACCrTGCTCAGGCCA 

GAGAAGGACAAGGCAGCCACCAGAGTGGATGCCATCTGTACCCACCACCCTGACCCTCAAAG 

CCCTGGACTGAACAGAGAGCAGCTGTACTGGGAGCTGAGCCAGCTGACCCACGGCATCACTG 

AGCTGGGCCCCrACACCCTGGACAGGGACAGTCTCTATGTCGATGGTTTCACTCATTGGAGCC 

CCATACCAACCACCAGCACTCCTGGGACCTCCATAGTGAACCTGGGAACCTCTGGGATCCCAC 

CTTCCCTCCCTGAAACTACAGCCACCGGCCCTCTCCTGGTGCCATTCACACT 

CACTAACCTACAGTATGAGGAGAACATGGGTCACCCTGGCTCCAGGAAGTTCAACATCACGGA 

GAGTGTTCTGCAGGGTCTGCTCAAGCCCTTGTTCAAGAGCACCAGTGTTGGCCCTCTGTATTCT 

GGCTGCAGACTGACCTTGCTCAGGCCTGAGAAGGACGGAGTAGCCACCAGAGTGGACGCCAT 

CTGCACCCACCGCCCTGACCCCAAAATCCCTGGGCTAGACAGACAGCAGCTATACTGGGAGCT 

GAGCCAGCTGACCCACAGCATCACTGAGCTGGGACCCTACACCCTGGATAGGGACAGTCTCTA 

TGTCAATGGTTTCACCCAGCGGAGCTCTGTGCCCACCACCAGCAGTGAGTATTCTACTGATGTT 

CCCATGGCCCCAATCTTACAACAAACTTAGCAGGAGCTGACCCCTATTCATAAGCCCTTATGT 

CCTTTCCATAAGGGAAGGAACATAGAGGACACAAATTATTCCCCTTCCCCACTGCCCCAGCTA 

ATCAGAGTCCCAGCTGAAGCCCCACAGGCAAAAATCCCCATGAATAGTCCCTCCTGCTGGCAT 

TA(3ITTCCATGAGAGCACNTTGCTCCTTTCACTGTTGAGGGCTTCTCCTCAGCTCCTGGGACTT 

TCACAGTACAGCCGGAAACCTCTGAGACTCCATCATCCCTCCCTGGCCCCACAGGTAAATACC 

AGTCAATGGTATTTGGAGCATGGTTGATGAGTGTAAACATCTCTGTTTATACTCTGTTAGAGC 

ATGGTTGATGAGTGTAAACATCTCTGTCATTATTCACTCAACTAAAGATGGAAATTCATAGTA 

AATGTAGTAACCATAGGTCAACCAACCCAGTTCATTGAGCACTGCCTCTGTATCAGGACCTGG 
ATATACATCAGGGAACAAAAAAAAAAAAAAAAAA 
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FIGURE 12 

MGTTYTMGETSVSISTSDFFETSRIQIEFrSSLTSGLRETSSSERISSATEGSTVIilEWSGATTEVSR 
TEVISSRGTSMSGPDQFnSPDBTEAITm^TSPMTESAESAITOTGSPGATSEGTLTLDTSTTTFW 
SGTHSTASPGFSHSEmTLMSRTPGDWWPSLPSVEEASSVSSSLSSPAMTSTSFFSTLPESISSSPH 
PVTALLTLGPVKTTDMLRTSSEPETSSPPNLSSTSAEILATSEVTKDREKIHPSSNTPVVNVGT^ 

BCHLSPSSVLADLVTIKFTSPMATOTLGNTSVSTSTPAFPETMMTQITSSLTS 

TERSASLSGMPTGATTKVSRTEAI^LGRTSTPGPAQSTOPEISTEmRISTPLTTTGSAEMTITPKT 

GHSGASSQGTFTLDTSSRASWPGTHSAATHRSPHSGMTTPMSRGPEDVSWPSRPSVEKTSPPSSL 

VSI^AWSPSPLYSTPSESSHSSPLRVTSLFTPVMMKTTDMLDTSLEPVTTSPPSMNITSDESLATS 

KATMETEMQLSENTAVTHMGTISARQEFYSSYPGLPEPSKVTSPMWSSTIKDrVSTTIPASSEITR 

ffiMESTSTLTPTPRETSTSQEMSATBCPSTWYKALTSATffiDSMTQVMSSSRGPSPDQSTMSQDIST 

EVITRLSTSPIKTESTEMTriTQTGSPGATSRGTLTLDTSTTFMSGTHSTASQGFSHSQW 

PGEWWI^HPSVEEASSASFSLSSPVMTSSSPVSSTLPDSIHSSSLPVTSLLTSGLVKTTELLGTSSE 

PETSSPPNLSSTSAEILATTEVTrDTEKLEMTNVVTSGYTHESPSSVLADSVTTKATSSMGITYPTG 

DTNVLTSTPAFSDTSRIQTKSKLSLTPGLMETSISEETSSATEKSTVI^SVPTGATIEVSRTEAISSS 

RTSIPGPAQSTMSSDTSMETITRISTPLTRKESTDMArrPKTGPSGATSQGTFTLDSSSTASWPGTH 

SATTQRFPRSVVTTPMSRGPEDVSWPSPLSVEKNSPPSSLVSSSSVTSPSPLYSTPSGSSHSSPVPVT 

SLFTSMMKATDMLDASLEPETTSAPNMNITSDESLAASKATTETEAIHVFENTAASHVETTS 

ELYSSSPGFSEPTKV1SPVWSSSIRDNMVSTTMPGSSGITPJEIESMSSLTPGLRETRTSQDITSSTET 

STVLYKMPSGATPEVSRTEVMPSSRTSIPGPAQSmSLDISDEWmSTSPIMTESAEriTrTQTGY 

SLATSOWLPLGTSMmSGTHSTMSQGI^HSEMT^MSRGPESLSWTSPRFVETTRSSSSLTSLP 

L1TSLSPVSSTLLDSSPSSPLP\^LILPGLVKTTEVLDTSSEPKTSSSPNLSSTSVEIPATSEIMTDTE 

KmPSSNTAVAKWTSSSVHESHSSVLADSETTITIPSMGrrSAVEDTTVFTSNPAFSETRPJPTEPT^ 

SLTPGFRETSTSEETTSrrETSAVLFGVPTSATTEVSMTEIMSSNRTHIPDSDQSTMSPDnTEVITRL 

SSSSMMSESTQMTITTQKSSPGATAQSTLTLATTTAPLARTHSTWPRFXHSEMITLMSRSPENPS 

WKSSPFVEKTSSSSSLI^LPWTSPSVSSTLPQSIPSSSFSVTSIiTPGMVKTTDTSTEPGTSLSPNLS 

GTSVEILAASEVTTDTEKIHPSSSMAVTNVGTTSSGHELYSSVSfflSEPSKATYPVGTPSSMAETSI 

STSMPANFETTGFEAEPFSHLTSGIJIKTNMSLDTSSVTPTNTPSSPGSTHLLQSSKTD 

PDWPPASQYTEIPVDIITPFNASPSITESTGITSFPESRFrMSVTESTHHI^TDLLPSAETlSTGTVMP 

SLSEAMTSFATTGWRAISGSGSPFSRTESGPGDATLSTIAESLPSSTPWFSSSTFTTTDSSTIPALH 

E^^SSSATPmVDTSLGTESSTTEGRLVMGTESSTTEGRLVMVSTLDTSSQPGRTSSSPmDTR^^ 

S\^LGTWSAYQWSLSTRLTRTDGIMEmTKIPNEAAHRGTIRPVKGPQTSTSPASPKGLHTGGT 

KRMETITrALKTTTTALKTTSRATLTTSVYTPTLGTLTPLNASMQMASTI^ 

PErrSSLATSLGAETSTALPRTTPSVFNRESEITASLVSRSGAERSPVIQTLDVSSSEPDTTASWVI 

HPAETIPWSKTTPNFFHSELDTVSSTATSHGADVSSAIPTMSPSELDALTPLVTISGTDTSTTFPTL 

TKSPHETETRTTWLTHPAETSSTIPRTIPOTSHHESDATPSIATSPGAETSSAIPIMTVSPGAEDLVTS 

QVTSSGTDRNMnPTLTDSPGEPKTIASLVTHPEAQTSSAIPTSTIS 



WO 02/092836 



23/25 



PCT/US02/14768 



FIGURE 12 
(cont . ) 



PAVSRLVTSMVTSLAAKTSTINRALTNSPGEPATTV^^ 

MTTSHGAESSSAVPTPWSTEWGVVTPLVTSSRAVISTTIPmTLSPGEPETTPSMATSHGEEASSA 

IPTPTVSPGWGVWSLVTSSRAWSTTIPE.TFSLGEPETTPSMATSHGTEAGSAVPTVLPEVPGM 

VTSLVASSRAVTSTTLPTLTLSPGEPETTPSMATSHGAEASSTVPTVSPEWGWTSLWSSSGVN 

STSIPTLILSPGELETTPSMATSHGAEASSAWTPTVSPGVSGVVTPLVTSSRAVTSTTIPILTLSSSE 

PETIPSMATSHGVEASSAVLTVSPEWGMVTFLVTSSRAVTSTTIPTLTISSDEP 

KNnSAIPTLGVSPTVQGLWSLVTSSGSETSAFSNLTVASSQPETTOSWAHPGTEASSVVPTLTVS 
TGEPFIMSLVTHPAESSSTLPRTTSRFSHSELDTMPSTVTC 

GRDISATFPTWESPHESEATASWVTHPAWSTTVPRTTPNYSHSEPDTTPSIATSPGAEATSDFra 
TVSPDWDMVTSQVTSSGTDTSITIPTLTLSSGEPETTTSFITYSETHTSSAIPTLPVSPDASKMLTSL 
VISSGTDSTTTFPTLTETPYEPETTAIQLIHPAETNTMWRTTPKFSHSKSDTTLPVA^ 
VSTTTISPDMSDLVTSLWSSGTDTSTTFPTLSETPYEPE 

DTAPSMVTSPGVDTRSGVPTTTIPPSffGWTSQVTSSATDTSTAIPTLTPSPGEPETTASSATHPGT 

QTGFTWIRTVPSSEPDTMASWVTHPPQTSTPVSRTTSSFSHSSPDATPVMATSPRTEASSAVLrn 

SPGAPEMVTSQITSSGAATSTTVPTLTHSPGMPETTALI^THPRTETSKTFPAST^ 

TIRPGAFrSTALPTQTTSSLFTLLWGTSRVDLSPTASPGVSAKTAPLSTHPGTBTSTMIPTSTLSLG 

LLEtTGLLATSSSAETSTSTLlLTVSPAVSGLSSASITTDKPQTVTSWNTETSPSVTSVGPPEFSRT 

VTGTTMTLIPSEMPTPPKTSHGEGVSPTTIL^TTMVEATNLATTGSSPTVAKT^ 

PLTTPGMSTLASESWSRTSYNHRSWISTTSGYNRRYWTPATSTPVTSTFSPGISTSSIPSSTAAT^ 
FMWFTLNFTITNLQYEEDM^ 

SATAVDAICTHRPDPEDLGLDRERLYWEL^NLTNGIQELGPYTLDRNSLYWGFTHRSSMPT^ 
PGTSTVDVGTSGTPSSSPSPTTAGPLLMPFTLl^^ 

LFKNTSVGPLYSGCRLTLLP^EKDGAATGVDAICTHRLDPKSXGLNREQLYWELSKLTNDBEELG 
PYTLDRNSLYVNGFTOQSSVSATSTPGTSTVDLRTC^ 

YGEDMGIIPGSRKF]OTm\^LGLLGPIFKNTSVGPLYSGCRLTSLRSEKDGAATGVDAICIHHLD 
PKSPGLNRERLYWELSQLTNGIKELGPYTLD^ 

SLPSPATAGPLLVLFTLNFnThOLYEEDMHRPGSR 

RLTLLRSEKDGAATGWAICTHtU^DPKSPGVDREQLYWELSQLTNGIKELGPYTLDRNSLYVNG 
FTHWIPVPTSSPGTSTVDLGSGTPSSLPSPTSAAGPLLWFTLNFriTNLQ 

ERVLQTLVGPMFKNTSVGLLYSGCRLTLLRSEKDGAATGVDAICmRLDPKSPGVDREQLYWEL 
SQLTNGIK^LGPYTLDRNSLYVNGFTHMPVPTSSTPGTSTVDLGSGTPSSLPSPTTAGPLLWFTL 
NFTITNLKYEEDMHCPGSPJaOTTFJIVLQSLLGPMFKNTSVGPLYSGCRLTIXRSE 

DAICTHRLDPK5PGVDREQLYWELSQLTNGIKELGPYTLDRNSLYVNGFTHQTSAPNTSTPGTST 
VDLGTSGTPSSLPSPTSAGPLLWFTLNFTITNIX^ 

SVGLLYSGCRLTLLPJ>EKNGAATGMDAICSHRLDPKSPGLNREQLYWELSQLTHGIKELGPYTLD 

RNSLYVNGFTHRSSVAPTSTPGTSTVDLGTSGTPSSLPSPTTAWLLVPFTLNFnimQYGEDMR 

HPGSRKFNTTERVLQGLLGPLFKNSSVGPLYSGCRLISLRSEKDGAATGVDAIC 
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FIGURE 12 
(cont . ) 

LYSGCRLTSLRPEKDGAATGMDAV(XYHPWKRPGLDREQLYWELSQLTHNTrELGPYSLDRJDS 

LYVNGFTHQNSVPTTSTPGTSTVYWATTGTPSSFPGHTE 
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